document categorization using solr?

2010-03-25 Thread Joel Nylund

Hi,

Does solr have something built in, or recommended add-on that does  
document categorization? ( I found a thread about a year ago, but not  
exact same topic)


For example, here is a commercial categorization product that will  
take a website and categorize it


http://grapeshot.co.uk/online-demo-3.php?url=http://www.solutionstreet.com

I am looking for something similar that works with Solr/Lucene and is  
open source based.


Seems like Weka (http://weka.wikispaces.com/Frequently+Asked 
+Questions)  might be close, but not sure. Also not sure how to come  
up with a category list


thanks
Joel



Re: weird sorting behavior

2009-12-31 Thread Joel Nylund

Hi,

After some further investigation, it turns out that null fields were  
sorting first, so if the title was null it was coming up first. This  
is true even with 1.5 and collatedROOT. (I tried on last nights build).


So let me change my question, how do I make items with null values  
sort last?


thanks
Joel

On Dec 30, 2009, at 3:11 PM, Joel Nylund wrote:


Hi, so this is only available in 1.5?

I tried in 1.4 and got :

org.apache.solr.common.SolrException: Error loading class  
'solr.CollationKeyFilterFactory'


Is there a way to do this in 1.4?

The link Shalin sent is a 1.5 link I think.

thanks
Joel

On Dec 25, 2009, at 10:52 PM, Robert Muir wrote:

Hello, as Shalin said, you might want to try  
CollationKeyFilterFactory.


Below is an example (using the multilingual root locale), where the
spaces will sort after the letters and numbers as you mentioned, but
it will still not be case-sensitive. This is because strength is
'secondary'.

But are you really sure you want the spaces sorted after the letters
and numbers? Or instead do you just want them ignored for sorting? If
this is the case, then try 'primary', so that spaces, punctuation,
accents and things like that in addition to case are ignored in the
sort: for example Test-1234 andtest1234 sort the same with
primary, but not with secondary (the one with leading spaces will  
sort

last)

If all else fails, you can write custom rules for it too, as Shalin  
mentioned.


fieldType name=collatedROOT class=solr.TextField
analyzer
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.CollationKeyFilterFactory
  language=
  strength=secondary
  /
/analyzer
/fieldType

On Fri, Dec 25, 2009 at 5:37 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:


On Thu, Dec 24, 2009 at 11:51 PM, Joel Nylund jnyl...@yahoo.com  
wrote:


update, I tried changing to datatype string, and it sorts the  
numerics

better, but the other sorts are not as good.

Is there a way to control sorting for special chars, for example,  
I want

blanks to sort after letters and numbers.


In the general case, CollationKeyFilterFactory will do the trick.  
You could
create a custom rule set which sorts spaces after letters and  
numbers. See

http://wiki.apache.org/solr/UnicodeCollation



using alphaOnlySort - sorts nicely for alpha, but numbers dont work
string - sorts nicely for numbers and letters, but special chars  
like

blanks show up first in the list


alphaOnlySort has a PatternReplaceFilterFactory which removes all  
characters
except a-z. This is the reason behind those wierd results. You  
could try

removing that filter and see if thats what you need.

--
Regards,
Shalin Shekhar Mangar.




--
Robert Muir
rcm...@gmail.com






Re: weird sorting behavior

2009-12-31 Thread Joel Nylund

Thanks Erik,

the null problem was introduced when I copied the example below, now I  
have the nulls excluded using (sortMissingLast=true), in 1.5 using  
the suggested config below and im still not seeing the desired behavior.


It seems to me that the default behavior of the Java Collator using  
the ROOT locale (PRIMARY or SECONDARY dont seem to matter in this  
example) is as follows:


empty string
symbols (by this I mean $,  , *, * etc)
numerics
alpha
leading spaces

My desire is:
alpha
numeric
symbols
leading spaces
empty string

Im going to try a custom RuleBasedCollator to see if I can make this  
happen as Shalin suggested.


thanks
Joel



I
On Dec 31, 2009, at 11:11 AM, Erick Erickson wrote:

have you tried setting sortMissingLast=true in your schema.xml?  
Something

like...

fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/

or perhaps in your individual field definition instead. The schema.xml
examples have additional information that you really should scan at
least

HTH
Erick

On Thu, Dec 31, 2009 at 8:53 AM, Joel Nylund jnyl...@yahoo.com  
wrote:



Hi,

After some further investigation, it turns out that null fields were
sorting first, so if the title was null it was coming up first.  
This is true

even with 1.5 and collatedROOT. (I tried on last nights build).

So let me change my question, how do I make items with null values  
sort

last?

thanks
Joel


On Dec 30, 2009, at 3:11 PM, Joel Nylund wrote:

Hi, so this is only available in 1.5?


I tried in 1.4 and got :

org.apache.solr.common.SolrException: Error loading class
'solr.CollationKeyFilterFactory'

Is there a way to do this in 1.4?

The link Shalin sent is a 1.5 link I think.

thanks
Joel

On Dec 25, 2009, at 10:52 PM, Robert Muir wrote:

Hello, as Shalin said, you might want to try  
CollationKeyFilterFactory.


Below is an example (using the multilingual root locale), where the
spaces will sort after the letters and numbers as you mentioned,  
but

it will still not be case-sensitive. This is because strength is
'secondary'.

But are you really sure you want the spaces sorted after the  
letters
and numbers? Or instead do you just want them ignored for  
sorting? If

this is the case, then try 'primary', so that spaces, punctuation,
accents and things like that in addition to case are ignored in the
sort: for example Test-1234 andtest1234 sort the same with
primary, but not with secondary (the one with leading spaces will  
sort

last)

If all else fails, you can write custom rules for it too, as Shalin
mentioned.

fieldType name=collatedROOT class=solr.TextField
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.CollationKeyFilterFactory
language=
strength=secondary
/
/analyzer
/fieldType

On Fri, Dec 25, 2009 at 5:37 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:



On Thu, Dec 24, 2009 at 11:51 PM, Joel Nylund jnyl...@yahoo.com
wrote:

update, I tried changing to datatype string, and it sorts the  
numerics

better, but the other sorts are not as good.

Is there a way to control sorting for special chars, for  
example, I

want
blanks to sort after letters and numbers.


In the general case, CollationKeyFilterFactory will do the  
trick. You

could
create a custom rule set which sorts spaces after letters and  
numbers.

See
http://wiki.apache.org/solr/UnicodeCollation


using alphaOnlySort - sorts nicely for alpha, but numbers dont  
work
string - sorts nicely for numbers and letters, but special  
chars like

blanks show up first in the list


alphaOnlySort has a PatternReplaceFilterFactory which removes all

characters
except a-z. This is the reason behind those wierd results. You  
could try

removing that filter and see if thats what you need.

--
Regards,
Shalin Shekhar Mangar.





--
Robert Muir
rcm...@gmail.com










Re: weird sorting behavior

2009-12-30 Thread Joel Nylund

Hi, so this is only available in 1.5?

I tried in 1.4 and got :

org.apache.solr.common.SolrException: Error loading class  
'solr.CollationKeyFilterFactory'


Is there a way to do this in 1.4?

The link Shalin sent is a 1.5 link I think.

thanks
Joel

On Dec 25, 2009, at 10:52 PM, Robert Muir wrote:

Hello, as Shalin said, you might want to try  
CollationKeyFilterFactory.


Below is an example (using the multilingual root locale), where the
spaces will sort after the letters and numbers as you mentioned, but
it will still not be case-sensitive. This is because strength is
'secondary'.

But are you really sure you want the spaces sorted after the letters
and numbers? Or instead do you just want them ignored for sorting? If
this is the case, then try 'primary', so that spaces, punctuation,
accents and things like that in addition to case are ignored in the
sort: for example Test-1234 andtest1234 sort the same with
primary, but not with secondary (the one with leading spaces will sort
last)

If all else fails, you can write custom rules for it too, as Shalin  
mentioned.


fieldType name=collatedROOT class=solr.TextField
 analyzer
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.CollationKeyFilterFactory
   language=
   strength=secondary
   /
 /analyzer
/fieldType

On Fri, Dec 25, 2009 at 5:37 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:


On Thu, Dec 24, 2009 at 11:51 PM, Joel Nylund jnyl...@yahoo.com  
wrote:


update, I tried changing to datatype string, and it sorts the  
numerics

better, but the other sorts are not as good.

Is there a way to control sorting for special chars, for example,  
I want

blanks to sort after letters and numbers.


In the general case, CollationKeyFilterFactory will do the trick.  
You could
create a custom rule set which sorts spaces after letters and  
numbers. See

http://wiki.apache.org/solr/UnicodeCollation



using alphaOnlySort - sorts nicely for alpha, but numbers dont work
string - sorts nicely for numbers and letters, but special chars  
like

blanks show up first in the list


alphaOnlySort has a PatternReplaceFilterFactory which removes all  
characters
except a-z. This is the reason behind those wierd results. You  
could try

removing that filter and see if thats what you need.

--
Regards,
Shalin Shekhar Mangar.




--
Robert Muir
rcm...@gmail.com




weird sorting behavior

2009-12-24 Thread Joel Nylund

I have a field:

	field name=title type=alphaOnlySort indexed=true stored=true  
required=false/



 fieldType name=alphaOnlySort class=solr.TextField  
sortMissingLast=true omitNorms=true

  analyzer
!-- KeywordTokenizer does no actual tokenizing, so the entire
 input string is preserved as a single token
  --
tokenizer class=solr.KeywordTokenizerFactory/
!-- The LowerCase TokenFilter does what you expect, which  
can be

 when you want your sorting to be case insensitive
  --
filter class=solr.LowerCaseFilterFactory /
!-- The TrimFilter removes any leading or trailing  
whitespace --

filter class=solr.TrimFilterFactory /
!-- The PatternReplaceFilter gives you the flexibility to use
 Java Regular expression to replace any sequence of  
characters

 matching a pattern with an arbitrary replacement string,
 which may include back references to portions of the  
original

 string matched by the pattern.

 See the Java Regular Expression documentation for more
 information on pattern and replacement string syntax.

 
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html
  --
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement= replace=all
/
  /analyzer
/fieldType


When I sort it using titles that are alphanumeric it works great, but  
if the titles start with numbers, it almost seems random. Any  
suggestions?


thanks
Joel



Re: weird sorting behavior

2009-12-24 Thread Joel Nylund
update, I tried changing to datatype string, and it sorts the numerics  
better, but the other sorts are not as good.


Is there a way to control sorting for special chars, for example, I  
want blanks to sort after letters and numbers.


using alphaOnlySort - sorts nicely for alpha, but numbers dont work
string - sorts nicely for numbers and letters, but special chars like  
blanks show up first in the list


thanks
Joel

On Dec 24, 2009, at 11:20 AM, Joel Nylund wrote:


I have a field:

	field name=title type=alphaOnlySort indexed=true  
stored=true required=false/



fieldType name=alphaOnlySort class=solr.TextField  
sortMissingLast=true omitNorms=true

 analyzer
   !-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token
 --
   tokenizer class=solr.KeywordTokenizerFactory/
   !-- The LowerCase TokenFilter does what you expect, which  
can be

when you want your sorting to be case insensitive
 --
   filter class=solr.LowerCaseFilterFactory /
   !-- The TrimFilter removes any leading or trailing  
whitespace --

   filter class=solr.TrimFilterFactory /
   !-- The PatternReplaceFilter gives you the flexibility to use
Java Regular expression to replace any sequence of  
characters

matching a pattern with an arbitrary replacement string,
which may include back references to portions of the  
original

string matched by the pattern.

See the Java Regular Expression documentation for more
information on pattern and replacement string syntax.


http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html
 --
   filter class=solr.PatternReplaceFilterFactory
   pattern=([^a-z]) replacement= replace=all
   /
 /analyzer
   /fieldType


When I sort it using titles that are alphanumeric it works great,  
but if the titles start with numbers, it almost seems random. Any  
suggestions?


thanks
Joel





suggestions for DIH batchSize

2009-12-22 Thread Joel Nylund

Hi,

it looks like from looking at the code the default is 500, is the  
recommended setting for this?


Has anyone notice any significant performance/memory tradeoffs by  
making this much bigger?


thanks
Joel



Re: Request Assistance with DIH

2009-12-14 Thread Joel Nylund
Hi, sorry im not familiar with the dataimporthandler development  
console, I thought you were just trying to do a import.


For me to try to import data, I would do:

http://localhost:8983/solr/dataimport?command=full-import


Then check status of it using:

http://localhost:8983/solr/dataimport


you can refresh this screen as many times as you want, this should  
show progress and if it worked or not, also you should see errors in  
the log.


I used this to get started with DIH, even though its mysql, it  might  
help you

http://www.cabotsolutions.com/blog/200905/using-solr-lucene-for-full-text-search-with-mysql/

Joel


On Dec 14, 2009, at 10:27 AM, Turner, Robbin J wrote:

How does this help answer my question?  I am trying to use the  
DATAImportHandler Development console.  The url you suggest assumes  
I had it working already.


Looking at my logs and the response to the Development console, it  
does not appear that the connection to Oracle is being made.


So if someone could offer some configuration/connection setup  
directions I would very much appreciate it.


Thanks
Robbin

-Original Message-
From: Joel Nylund [mailto:jnyl...@yahoo.com]
Sent: Friday, December 11, 2009 8:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Request Assistance with DIH

add ?command=full-import to your url

http://localhost:8983/solr/dataimport?command=full-import

thanks
Joel

On Dec 11, 2009, at 7:45 PM, Robbin wrote:


I've been trying to use the DIH with oracle and would love it if
someone could give me some pointers.  I put the ojdbc14.jar in both
the Tomcat lib and solr home/lib.  I created a dataimport.xml and
enabled it in the solrconfig.xml.  I go to the http://solr server/
solr/admin/dataimport.jsp.  This all seems to be fine, but I get the
default page response and doesn't look like the connection to the
oracle server is even attempted.

I'm using the Solr 1.4 release on Nov 10.
Do I need an oracle client on the server?  I thought having the ojdbc
jar should be sufficient.  Any help or configuration examples for
setting this up would be much appreciated.

Thanks
Robbin






Re: Auto update with deltaimport

2009-12-12 Thread Joel Nylund

windows or unix?

unix - make a shell script and call it from cron

windows - make a .bat or .cmd file and call it from scheduler

within the shell scripts/bat files use wget or curl to call the right  
import:


wget -q -O /dev/null http://localhost:8983/solr/dataimport?command=delta-import


Joel

On Dec 12, 2009, at 1:38 AM, Olala wrote:



Hi All!

I am developing a search engine using Solr, I was tested full-import  
and

delta-import command successfully.But now,I want to run delta-import
automatically with my schedule.So, can anyone help me???

Thanks  Regards,
--
View this message in context: 
http://old.nabble.com/Auto-update-with-deltaimport-tp26755386p26755386.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Request Assistance with DIH

2009-12-11 Thread Joel Nylund

add ?command=full-import to your url

http://localhost:8983/solr/dataimport?command=full-import

thanks
Joel

On Dec 11, 2009, at 7:45 PM, Robbin wrote:

I've been trying to use the DIH with oracle and would love it if  
someone could give me some pointers.  I put the ojdbc14.jar in both  
the Tomcat lib and solr home/lib.  I created a dataimport.xml and  
enabled it in the solrconfig.xml.  I go to the http://solr server/ 
solr/admin/dataimport.jsp.  This all seems to be fine, but I get the  
default page response and doesn't look like the connection to the  
oracle server is even attempted.


I'm using the Solr 1.4 release on Nov 10.
Do I need an oracle client on the server?  I thought having the  
ojdbc jar should be sufficient.  Any help or configuration examples  
for setting this up would be much appreciated.


Thanks
Robbin




Re: # in query

2009-12-08 Thread Joel Nylund

Thanks Eric,

I looked more into this, but still stuck:

I have this field indexed using text_rev

I looked at the luke analysis for this field, but im unsure how to  
read it.


When I query the field by the id I get:

result name=response numFound=1 start=0
−
doc
str name=id5405255/str
str name=textTitle###'s test blog/str
/doc
/result

If I try to query even multiple ### I get nothing.

Here is what luke handler says:  (btw when I used id instead of docid  
on luke I got a nullpointer exception  /admin/luke?docid=5405255  vs / 
admin/luke?id=5405255)


lst name=textTitle
str name=typetext_rev/str
str name=schemaITS---/str
str name=indexITS--/str
int name=docs290329/int
int name=distinct401016/int
−
lst name=topTerms
int name=#1;golb49362/int
int name=blog49362/int
int name=#1;ecapsym29426/int
int name=myspace29426/int
int name=#1;s8773/int
int name=s8773/int
int name=#1;ed8033/int
int name=de8033/int
int name=com6884/int
int name=#1;moc6884/int
/lst
−
lst name=histogram
int name=1308908/int
int name=234340/int
int name=421916/int
int name=814474/int
int name=169122/int
int name=325578/int
int name=643162/int
int name=1281844/int
int name=256910/int
int name=512464/int
int name=1024182/int
int name=204872/int
int name=409626/int
int name=819212/int
int name=163842/int
int name=327682/int
int name=655362/int
/lst
/lst


solr/select?q=textTitle:%23%23%23  - gets no results.

I have the same field indexed as a alphaOnlySort, and it gives me lots  
of results, but not the ones I want.


Any other ideas?

thanks
Joel


On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote:


Well, the very first thing I would is examine the field definition in
your schema file. I suspect that the tokenizers and/or
filters you're using for indexing and/or querying is doing something
to the # symbol. Most likely stripping it. If you're just searching
for the single-letter term #, I *think* the query parser silently  
just

drops that part of the clause out, but check on that.

The second thing would be to get a copy of Luke and examine your
index to see if what you *think* is in your index actually is there.

HTH
Erick

On Mon, Dec 7, 2009 at 3:28 PM, Joel Nylund jnyl...@yahoo.com wrote:

ok thanks,  sorry my brain wasn't working, but even when I url  
encode it, I
dont get any results, is there something special I have to do for  
solr?


thanks
Joel


On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:

Sure you have to escape it! %23


otherwise the browser considers it as a separator between the URL  
for the
server (on the left) and the fragment identifier (on the right)  
which is not

sent the server.

You might want to read about URL-encoding, escaping with  
backslash is a

shell-thing, not a thing for URLs!

paul


Le 07-déc.-09 à 21:16, Joel Nylund a écrit :

Hi,


How can I put a # sign in a query, do I need to escape it?

For example I want to query books with title that contain #

No work so far:
http://localhost:8983/solr/select?q=textTitle:#;
http://localhost:8983/solr/select?q=textTitle:#
http://localhost:8983/solr/select?q=textTitle:\#;

Getting
org.apache.lucene.queryParser.ParseException: Cannot parse  
'textTitle:\':

Lexical error at line 1, column 12.  Encountered: EOF after : 

and sometimes just no response.


thanks
Joel










Re: # in query

2009-12-08 Thread Joel Nylund
ok, I just realized I was using the luke handler, didnt know there was  
a fat client, I assume thats what you are talking about.


I downloaded the lukeall.jar, ran it, pointed to my index, found the  
document in question, didn't see how it was tokenized, but I clicked  
the reconstruct  edit button,


this gives me a tab that has tokenized per field, for this field it  
shows:



s|s, ecapsym|myspace, golb|blog

title is: ###'s myspace blog

schema is:

 !-- A general unstemmed text field that indexes tokens normally and  
also
 reversed (via ReversedWildcardFilterFactory), to enable more  
efficient

 leading wildcard queries. --
fieldType name=text_rev class=solr.TextField  
positionIncrementGap=100

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=1  
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.ReversedWildcardFilterFactory  
withOriginal=true
   maxPosAsterisk=3 maxPosQuestion=2  
maxFractionAsterisk=0.33/

  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt ignoreCase=true expand=true/

filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=0  
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/

filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


	field name=textTitle type=text_rev indexed=true stored=true  
required=false multiValued=false/




thanks
Joel




On Dec 8, 2009, at 11:14 AM, Erick Erickson wrote:

In Luke, there's a tab that will let you go to a document ID. From  
there

you can see all the fields in a particular document, and examine what
the actual tokens stored are. Until and unless you know what tokens
are being indexed, you simply can't know what your queries should look
like...

*Assuming* that the ### are getting indexed and *assuming* your  
tokenizer

tokenized on, whitespace, and *assuming* that by text_rev you
are talking about ReversedWildcardFilterFactory, I
wouldn't expect a search to match if it wasn't exactly:
s'###. But as you see, there's a long chain of assumptions there  
any

one of which may be violated by your schema. So please post the
relevant portions of your schema to make it easier to help.

Best
Erick


On Tue, Dec 8, 2009 at 9:54 AM, Joel Nylund jnyl...@yahoo.com wrote:


Thanks Eric,

I looked more into this, but still stuck:

I have this field indexed using text_rev

I looked at the luke analysis for this field, but im unsure how to  
read it.


When I query the field by the id I get:

result name=response numFound=1 start=0
-
doc
str name=id5405255/str
str name=textTitle###'s test blog/str
/doc
/result

If I try to query even multiple ### I get nothing.

Here is what luke handler says:  (btw when I used id instead of  
docid on

luke I got a nullpointer exception  /admin/luke?docid=5405255  vs
/admin/luke?id=5405255)

lst name=textTitle
str name=typetext_rev/str
str name=schemaITS---/str
str name=indexITS--/str
int name=docs290329/int
int name=distinct401016/int
-
lst name=topTerms
int name=#1;golb49362/int
int name=blog49362/int
int name=#1;ecapsym29426/int
int name=myspace29426/int
int name=#1;s8773/int
int name=s8773/int
int name=#1;ed8033/int
int name=de8033/int
int name=com6884/int
int name=#1;moc6884/int
/lst
-
lst name=histogram
int name=1308908/int
int name=234340/int
int name=421916/int
int name=814474/int
int name=169122/int
int name=325578/int
int name=643162/int
int name=1281844/int
int name=256910/int
int name=512464/int
int name=1024182/int
int name=204872/int
int name=409626/int
int name=819212/int
int name=163842/int
int name=327682/int
int name=655362/int
/lst
/lst


solr/select?q=textTitle:%23%23%23  - gets no results.

I have the same field indexed as a alphaOnlySort, and it gives me  
lots of

results, but not the ones I want.

Any other ideas?

thanks
Joel



On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote:

Well, the very first thing I would is examine the field definition in

your schema file. I suspect that the tokenizers and/or
filters you're using for indexing and/or querying is doing something
to the # symbol. Most likely stripping it. If you're just searching
for the single-letter term #, I *think* the query parser  
silently just

drops that part of the clause out, but check on that.

The second thing would be to get a copy of Luke

# in query

2009-12-07 Thread Joel Nylund

Hi,

How can I put a # sign in a query, do I need to escape it?

For example I want to query books with title that contain #

No work so far:
http://localhost:8983/solr/select?q=textTitle:#;
http://localhost:8983/solr/select?q=textTitle:#
http://localhost:8983/solr/select?q=textTitle:\#;

Getting
org.apache.lucene.queryParser.ParseException: Cannot parse 'textTitle: 
\': Lexical error at line 1, column 12.  Encountered: EOF after : 


and sometimes just no response.


thanks
Joel



Re: # in query

2009-12-07 Thread Joel Nylund
ok thanks,  sorry my brain wasn't working, but even when I url encode  
it, I dont get any results, is there something special I have to do  
for solr?


thanks
Joel

On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:


Sure you have to escape it! %23

otherwise the browser considers it as a separator between the URL  
for the server (on the left) and the fragment identifier (on the  
right) which is not sent the server.


You might want to read about URL-encoding, escaping with backslash  
is a shell-thing, not a thing for URLs!


paul


Le 07-déc.-09 à 21:16, Joel Nylund a écrit :


Hi,

How can I put a # sign in a query, do I need to escape it?

For example I want to query books with title that contain #

No work so far:
http://localhost:8983/solr/select?q=textTitle:#;
http://localhost:8983/solr/select?q=textTitle:#
http://localhost:8983/solr/select?q=textTitle:\#;

Getting
org.apache.lucene.queryParser.ParseException: Cannot parse  
'textTitle:\': Lexical error at line 1, column 12.  Encountered:  
EOF after : 


and sometimes just no response.


thanks
Joel







how to get list of unique terms for a field

2009-12-04 Thread Joel Nylund

Hi,

lets say I have a field called countryName, is there a way to get a  
list of all the countries for this field? Trying to figure out a nice  
way to keep my categories and the solr results in sync, would be nice  
to get these from solr instead of the database.


thanks
Joel



weird behavior between 2 enviorments

2009-12-03 Thread Joel Nylund

I have 2 environments one works great for this query:

my osx environment:

http://localhost:8983/solr/select?q=countryName:%22Bosnia%20and%20Herzegovina%22 
  - returns 2 results


my linux environment:

http://localhost:8983/solr/select?q=countryName:%22Bosnia%20and%20Herzegovina%22 
  - returns 0 results



same configs, same index etc, both using solr 1.4, in linux env if I  
run this query:


/solr/select?q=id:96465437

response
−
lst name=responseHeader
int name=status0/int
int name=QTime1/int
−
lst name=params
str name=qid:96465437/str
/lst
/lst
−
result name=response numFound=1 start=0
−
doc
...
str name=countryNameBosnia and Herzegovina/str

/doc
/result
/response

So the records are in the index.

I checked the admin, they are indexed using the same type (text), and  
I cannot see any differences.


any idea why it works on one env and not the other? anything I can  
check in admin to get to the bottom of this?


thanks
Joel



Re: weird behavior between 2 enviorments

2009-12-03 Thread Joel Nylund

thanks that was it

Joel

On Dec 3, 2009, at 11:06 AM, Yonik Seeley wrote:


The schemas probably aren't the same. Looks like one has position
increments enabled for the stopword filter in the field type, and one
doesn't.

-Yonik
http://www.lucidimagination.com



On Thu, Dec 3, 2009 at 11:00 AM, Joel Nylund jnyl...@yahoo.com  
wrote:
same client, here are the debug results, something interesting is  
going on,

I dont understand solr/lucene well enough to understand, see below

not working env (linux)

response
-
lst name=responseHeader
int name=status0/int
int name=QTime2/int
-
lst name=params
str name=debugQuerytrue/str
str name=qcountryName:Bosnia and Herzegovina/str
/lst
/lst
result name=response numFound=0 start=0/
-
lst name=debug
str name=rawquerystringcountryName:Bosnia and Herzegovina/str
str name=querystringcountryName:Bosnia and Herzegovina/str
str name=parsedqueryPhraseQuery(countryName:bosnia  
herzegovina)/str
str name=parsedquery_toStringcountryName:bosnia herzegovina/ 
str

lst name=explain/
str name=QParserLuceneQParser/str
-
lst name=timing
double name=time2.0/double
-
lst name=prepare
double name=time1.0/double
-
lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
/lst
-
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
-
lst name=process
double name=time1.0/double
-
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response




working env (osx)

response
-
lst name=responseHeader
int name=status0/int
int name=QTime54/int
-
lst name=params
str name=qcountryName:Bosnia and Herzegovina/str
str name=debugQuerytrue/str
/lst
/lst
-
result name=response numFound=2 start=0
-
doc
str name=countryNameBosnia and Herzegovina/str
str name=id83964763/str
/doc
-
doc
str name=countryNameBosnia and Herzegovina/str
str name=id96465437/str
/doc
/result
-
lst name=debug
str name=rawquerystringcountryName:Bosnia and Herzegovina/str
str name=querystringcountryName:Bosnia and Herzegovina/str
str name=parsedqueryPhraseQuery(countryName:bosnia ?
herzegovina)/str
str name=parsedquery_toStringcountryName:bosnia ?  
herzegovina/str

-
lst name=explain
-
str name=83964763

15.619301 = fieldWeight(countryName:bosnia herzegovina in  
260955), product

of:
1.0 = tf(phraseFreq=1.0)
24.990881 = idf(countryName: bosnia=2 herzegovina=2)
0.625 = fieldNorm(field=countryName, doc=260955)
/str
-
str name=96465437

15.619301 = fieldWeight(countryName:bosnia herzegovina in  
275091), product

of:
1.0 = tf(phraseFreq=1.0)
24.990881 = idf(countryName: bosnia=2 herzegovina=2)
0.625 = fieldNorm(field=countryName, doc=275091)
/str
/lst
str name=QParserLuceneQParser/str
-
lst name=timing
double name=time53.0/double
-
lst name=prepare
double name=time24.0/double
-
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
-
lst name=process
double name=time27.0/double
-
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.DebugComponent
double name=time27.0/double
/lst
/lst
/lst
/lst
/response


On Dec 3, 2009, at 10:20 AM, Yonik Seeley wrote:


Are you querying both systems from the same browser / client?
Try adding

debugging javascript DIH

2009-12-03 Thread Joel Nylund
is there a way to print to std out or anything from my javascript DIH  
transformer?


thanks
Joel


getting value from parent query in subquery transformer

2009-12-02 Thread Joel Nylund
Hi, I have an entity that has a entity within it that executes a query  
for each row and calls a transformer. Is there a way to pass a value  
from the parent query into the transformer?


For example, I have an entity called document, and it it has an ID and  
sometimes it has a category.


I have a sub entity called category that does another complex query  
using the documents ID to get data to send to the transformer to  
determine the category. I would like to pass the parents category to  
this transformer, so I dont have to join in data I already have. Is  
this possible?


Im using ${item.id} in the where clause, so I guess im wondering, can  
I do something like.


entity name=item query=..
	entity name=category  
transformer=script:SplitAndPrettyCategory(${item.category})  
query=..


thanks
Joel


NOT combined with OR is not getting exected results

2009-12-02 Thread Joel Nylund
http://localhost:8983/solr/select?q=%28NOT%20categoryType:%22MEDIATYPE%22%29 
   :gives 292289 results



http://localhost:8983/solr/select?q=fmMediaType:%22text%22   :gives  
530 results



http://localhost:8983/solr/select?q=%28NOT%20categoryType:%22MEDIATYPE%22%29%20OR%20fmMediaType:%22text%22 
   :gives 530 results


I expected a number higher than the first query.

thanks
Joel



Re: NOT combined with OR is not getting exected results

2009-12-02 Thread Joel Nylund

Hi, thanks, but still get 530 results for this new query your proposed.

thanks
Joel

On Dec 2, 2009, at 12:00 PM, AHMET ARSLAN wrote:

http://localhost:8983/solr/select?q=%28NOT%20categoryType:%22MEDIATYPE%22%29 
   :gives

292289 results


http://localhost:8983/solr/select?q=fmMediaType:%22text%22   :gives
530 results


http://localhost:8983/solr/select?q=%28NOT%20categoryType:%22MEDIATYPE%22%29%20OR%20fmMediaType:%22text%22 
   :gives

530 results

I expected a number higher than the first query.



NOT operator behaves a little bit different. It is like a filter.  
You just can't combine OR and NOT directly.


Try this:
q=(categoryType:[* TO* ] NOT categoryType:MEDIATYPE) OR  
fmMediaType:text



Solr allows q=(NOT categoryType:MEDIATYPE) query but it can be  
seen as q= *:* NOT categoryType:MEDIATYPE


Hope this helps.









Re: NOT combined with OR is not getting exected results

2009-12-02 Thread Joel Nylund

thanks that worked! and yes I have some with no categoryType

thanks
Joel

On Dec 2, 2009, at 2:24 PM, AHMET ARSLAN wrote:


Hi, thanks, but still get 530 results
for this new query your proposed.



May be you have some documents that has empty categoryType field.
Can you try this:
q = ((*:* -categoryType:MEDIATYPE) OR fmMediaType:text)

It should return at lest 292289 documents.







Re: getting total index size last update date/time from query

2009-12-01 Thread Joel Nylund
Hi, Luke worked, but we are finding it really slow in our environment  
(8-10 seconds). Is there a way to just get document count  last index  
time with a faster call, possibly passing something to luke?


thanks
Joel

On Nov 19, 2009, at 11:54 AM, Binkley, Peter wrote:

The Luke request handler (normally available at solr/admin/luke)  
will

give you the document count (not size on the disk, though, if that's
what you want) and last update and other info:

lst name=index
int name=numDocs14591/int
int name=maxDoc14598/int
int name=numTerms128730/int
long name=version1196962176380/long
bool name=optimizedfalse/bool
bool name=currenttrue/bool
bool name=hasDeletionstrue/bool
str name=directory s/solr/data/index/str
date name=lastModified2009-11-19T16:44:45Z/date
/lst

See http://wiki.apache.org/solr/LukeRequestHandler

Peter




-Original Message-
From: Joel Nylund [mailto:jnyl...@yahoo.com]
Sent: Thursday, November 19, 2009 8:31 AM
To: solr-user@lucene.apache.org
Subject: getting total index size  last update date/time from query

Hi,

Looking for total number of documents in my index and the
last updated date/time of the index.

Is there a way to get this through the standard query q=?

if not, what is the best way to get this info from solr.

thanks
Joel






Re: how to do partial word searches?

2009-11-25 Thread Joel Nylund

Hi Erick,

thanks for the links, I read both of them and I still have no idea  
what to do, lots of back and forth, but didn't see any solution on it.


One person talked about indexing the field in reverse and doing and ON  
on it, this might work I guess.


thanks
Joel


On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote:


copying from Eric Hatcher:

See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
does not have leading wildcard support enabled.

There's a pretty extensive recent exchange on this, see the
thread on the user's list titled

leading and trailing wildcard queryBest
Erick

On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund jnyl...@yahoo.com  
wrote:



Hi, I saw some older postings on this, but didnt see a resolution.

I have a field called title, I would like to be able to find  
partial word

matches within the title.

For example:

http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22

I would expect it to find:
str name=textTitlethe daily dish | by andrew sullivan/str

but it doesnt, it does find sully (which is fine with me also as a  
bonus),
but doesnt seem to get any of the partial word stuff. Oddly enough  
before I
lowercased the title, the wildcard matching seemed to work a bit  
better, it

just didnt deal with the case sensitive query.

At first I had mixed case titles and I read that the wildcard  
doesn't work
with mixed case, so I created another field that is a lowered  
version of the

title called textTitle, it is of type text.

Is it possible with solr to achieve what I am trying to do, if so  
how? If

not, anything closer than what I have?

thanks
Joel






Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread Joel Nylund

I see:

tcp46  0  0  *.8983 *.* 
LISTEN
tcp4   0  0  127.0.0.1.8983 *.* 
LISTEN


thanks
Joel

On Nov 25, 2009, at 5:21 PM, simon wrote:


first, check what port 8983 is bound to - should be listening on all
interfaces

netstat -an |grep 8983

You should see

tcp0  0 0.0.0.0:8983  0.0.0.0:*
LISTEN


-Simon

On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com  
wrote:


Hi, if I try to use any other hostname jetty doesnt work, gives a  
blank

page, if I telnet too the server/port it just disconnects.

I tried editing the scripts.conf to change the hostname, that didnt  
seem to

help.

For example I tried editing my etc/hosts file and added:

127.0.0.1 solriscool

then:
ping solriscool
PING solriscool (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms


sh-3.2# telnet solriscool 8983
Trying 127.0.0.1...
Connected to solriscool.
Escape character is '^]'.
GET / HTTP/1.1
Connection closed by foreign host.


telnet localhost 8983
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /solr HTTP/1.1
Host: localhost

HTTP/1.1 302 Found
Location: http://localhost/solr/
Content-Length: 0
Server: Jetty(6.1.3)


any ideas?

thanks
Joel






Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread Joel Nylund

yes says:

2009-11-25 18:08:59.967::INFO:  Started SocketConnector @ 0.0.0.0:8983

running on osx

thanks
Joel


On Nov 25, 2009, at 6:00 PM, simon wrote:

On Wed, Nov 25, 2009 at 5:27 PM, Joel Nylund jnyl...@yahoo.com  
wrote:



I see:

tcp46  0  0  *.8983 *.* 
LISTEN
tcp4   0  0  127.0.0.1.8983 *.* 
LISTEN




Not the same version of linux/netstat as mine, but I'd guess that the
second  line is the key to the problem -looks as though TCP over  
IPv4 is onl
y listening on the localhost interface, which is a network  
configuration

issue.

what does the Solr log say after it's started - should be a line

INFO:  Started SelectChannelConnector @ 0.0.0.0:8983


-Simon



thanks
Joel


On Nov 25, 2009, at 5:21 PM, simon wrote:

first, check what port 8983 is bound to - should be listening on all

interfaces

netstat -an |grep 8983

You should see

tcp0  0 0.0.0.0:8983  0.0.0.0:*
LISTEN


-Simon

On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com  
wrote:


Hi, if I try to use any other hostname jetty doesnt work, gives a  
blank

page, if I telnet too the server/port it just disconnects.

I tried editing the scripts.conf to change the hostname, that  
didnt seem

to
help.

For example I tried editing my etc/hosts file and added:

127.0.0.1 solriscool

then:
ping solriscool
PING solriscool (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms


sh-3.2# telnet solriscool 8983
Trying 127.0.0.1...
Connected to solriscool.
Escape character is '^]'.
GET / HTTP/1.1
Connection closed by foreign host.


telnet localhost 8983
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /solr HTTP/1.1
Host: localhost

HTTP/1.1 302 Found
Location: http://localhost/solr/
Content-Length: 0
Server: Jetty(6.1.3)


any ideas?

thanks
Joel









Re: help with dataimport delta query

2009-11-24 Thread Joel Nylund
Thanks that was it, well really this part:

${dataimporter.delta.job_jobs_id}

I thought the jobs_id was part of the DIH, but I guess it was just the example, 
duh!

thanks
Joel


--- On Tue, 11/24/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com 
wrote:

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Subject: Re: help with dataimport delta query
 To: solr-user@lucene.apache.org
 Date: Tuesday, November 24, 2009, 12:15 AM
 I guess the field names do not match
 in the deltaQuery you are selecting the field id
 
 and in the deltaImportQuery you us the field as
 ${dataimporter.delta.job_jobs_id}
 I guess it should be ${dataimporter.delta.id}
 
 On Tue, Nov 24, 2009 at 1:19 AM, Joel Nylund jnyl...@yahoo.com
 wrote:
  Hi, I have solr all working nicely, except im trying
 to get deltas to work
  on my data import handler
 
  Here is a simplification of my data import config, I
 have a table called
  Book which has categories, im doing subquries for
 the category info and
  calling a javascript helper. This all works perfectly
 for the regular query.
 
  I added these lines for the delta stuff:
 
         deltaImportQuery=SELECT f.id,f.title
                         FROM Book f
                       
  f.id='${dataimporter.delta.job_jobs_id}'
                 deltaQuery=SELECT id FROM
 `Book` WHERE fm.inMyList=1 AND
  lastModifiedDate 
 '${dataimporter.last_index_time}'  
 
  basically im trying to rows that lastModifiedDate is
 newer than the last
  index (or deltaindex).
 
  I run:
  http://localhost:8983/solr/dataimport?command=delta-import
 
  And it says in logs:
 
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DataImporter
  doDeltaImport
  INFO: Starting Delta Import
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.SolrWriter
  readIndexerProperties
  INFO: Read dataimport.properties
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  doDelta
  INFO: Starting delta collection.
  Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore
 execute
  INFO: [] webapp=/solr path=/dataimport
 params={command=delta-import}
  status=0 QTime=0
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  collectDelta
  INFO: Running ModifiedRowKey() for Entity: category
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  collectDelta
  INFO: Completed ModifiedRowKey for Entity: category
 rows obtained : 0
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  collectDelta
  INFO: Completed DeletedRowKey for Entity: category
 rows obtained : 0
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  collectDelta
  INFO: Completed parentDeltaQuery for Entity: category
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  collectDelta
  INFO: Running ModifiedRowKey() for Entity: item
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  collectDelta
  INFO: Completed ModifiedRowKey for Entity: item rows
 obtained : 0
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  collectDelta
  INFO: Completed DeletedRowKey for Entity: item rows
 obtained : 0
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  collectDelta
  INFO: Completed parentDeltaQuery for Entity: item
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  doDelta
  INFO: Delta Import completed successfully
  Nov 23, 2009 2:33:02 PM
 org.apache.solr.handler.dataimport.DocBuilder
  execute
  INFO: Time taken = 0:0:0.21
 
  But the browser says no documents added/modified (even
 though one record in
  db is a match)
 
  Is there a way to turn debugging so I can see the
 queries the DIH is sending
  to the db?
 
  Any other ideas of what I could be doing wrong?
 
  thanks
  Joel
 
 
  document name=doc
     entity name=item
       query=SELECT f.id, f.title
                 FROM Book f
                 WHERE f.inMyList=1
                 deltaImportQuery=SELECT
 f.id,f.title
                         FROM Book f
                       
  f.id='${dataimporter.delta.job_jobs_id}'
                 deltaQuery=SELECT id FROM
 `Book` WHERE fm.inMyList=1 AND
  lastModifiedDate 
 '${dataimporter.last_index_time}'  
 
            field column=id name=id /
            field column=title name=title
 /
                 entity name=category
  transformer=script:SplitAndPrettyCategory
 query=select fc.bookId,
  group_concat(cr.name) as categoryName,
                  from BookCat fc
                  where fc.bookId = '${item.id}'
 AND
                  group by fc.bookId
                  field
 column=categoryType name=categoryType /
                  /entity
     /entity
    /document
 
 
 
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



how to do partial word searches?

2009-11-24 Thread Joel Nylund

Hi, I saw some older postings on this, but didnt see a resolution.

I have a field called title, I would like to be able to find partial  
word matches within the title.


For example:

http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22

I would expect it to find:
str name=textTitlethe daily dish | by andrew sullivan/str

but it doesnt, it does find sully (which is fine with me also as a  
bonus), but doesnt seem to get any of the partial word stuff. Oddly  
enough before I lowercased the title, the wildcard matching seemed to  
work a bit better, it just didnt deal with the case sensitive query.


At first I had mixed case titles and I read that the wildcard doesn't  
work with mixed case, so I created another field that is a lowered  
version of the title called textTitle, it is of type text.


Is it possible with solr to achieve what I am trying to do, if so how?  
If not, anything closer than what I have?


thanks
Joel



Re: configure solr

2009-11-24 Thread Joel Nylund
for #1, under example, is there a webapps folder, does it contain  
solr.war ? are there any errors in your startup log for jetty, does it  
say anything about setting up solr, and solr home etc.


Joel

On Nov 24, 2009, at 4:55 PM, Jill Han wrote:


Hi,

I just downloaded solr -1.4.0 to my computer, C:\apache-solr-1.4.0.

1.I followed the instruction to run the sample, java -jar
start.jar at C:\apache-solr-1.4.0\example

And then go to http://localhost:8983/solr/admin, however, I got


HTTP ERROR: 404

   NOT_FOUND

RequestURI=/solr/admin

Powered by jetty:// http://jetty.mortbay.org

Did I miss something?

2.   Since I can't get sample run, I tried to run it on tomcat
server(5.5) directly as

a.   Copy/paste apache-solr-1.4.0.war to C:\Tomcat 5.5\webapps,

b.   Go to http://localhost:8080/apache-solr-1.4.0/

The error message is HTTP Status 500 - Severe errors in solr
configuration..

3.   How to configure it on tomcat server?

Your help is appreciated very much as always,

Jill









Re: help with dataimport delta query

2009-11-23 Thread Joel Nylund
got to love it when yahoo thinks your own mail is spam, anyone have  
any ideas how to get logging to work with 1.4.


I went to the admin panel and set all logging to finest.

In my jetty std out I see no SQL for any of the dataimport handler  
run. I see


Nov 23, 2009 9:26:27 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call

INFO: Time taken for getConnection(): 6
Nov 23, 2009 9:26:32 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity category with URL: jdbc:mysql:// 
localhost/feeddb
Nov 23, 2009 9:26:32 PM  
org.apache.solr.handler.dataimport.JdbcDataSource$1 call

INFO: Time taken for getConnection(): 5


But no sql, from looking at the source, it looks like it should be  
logging the sql if Im in debug mode.


any ideas, I think I am losing my mind.

my full import works, but the delta does nothing

thanks
Joel



On Nov 23, 2009, at 2:49 PM, Joel Nylund wrote:

Hi, I have solr all working nicely, except im trying to get deltas  
to work on my data import handler


Here is a simplification of my data import config, I have a table  
called Book which has categories, im doing subquries for the  
category info and calling a javascript helper. This all works  
perfectly for the regular query.


I added these lines for the delta stuff:

deltaImportQuery=SELECT f.id,f.title
FROM Book f
f.id='${dataimporter.delta.job_jobs_id}'
		deltaQuery=SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate  '${dataimporter.last_index_time}'  


basically im trying to rows that lastModifiedDate is newer than the  
last index (or deltaindex).


I run:
http://localhost:8983/solr/dataimport?command=delta-import

And it says in logs:

Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DataImporter doDeltaImport

INFO: Starting Delta Import
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties

INFO: Read dataimport.properties
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder doDelta

INFO: Starting delta collection.
Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=delta-import}  
status=0 QTime=0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Running ModifiedRowKey() for Entity: category
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed DeletedRowKey for Entity: category rows obtained : 0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed parentDeltaQuery for Entity: category
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Running ModifiedRowKey() for Entity: item
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed DeletedRowKey for Entity: item rows obtained : 0
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder collectDelta

INFO: Completed parentDeltaQuery for Entity: item
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder doDelta

INFO: Delta Import completed successfully
Nov 23, 2009 2:33:02 PM  
org.apache.solr.handler.dataimport.DocBuilder execute

INFO: Time taken = 0:0:0.21

But the browser says no documents added/modified (even though one  
record in db is a match)


Is there a way to turn debugging so I can see the queries the DIH is  
sending to the db?


Any other ideas of what I could be doing wrong?

thanks
Joel


document name=doc
   entity name=item
 query=SELECT f.id, f.title
FROM Book f
WHERE f.inMyList=1
deltaImportQuery=SELECT f.id,f.title
FROM Book f
f.id='${dataimporter.delta.job_jobs_id}'
		deltaQuery=SELECT id FROM `Book` WHERE fm.inMyList=1 AND  
lastModifiedDate  '${dataimporter.last_index_time}'  


  field column=id name=id /
  field column=title name=title /
		entity name=category  
transformer=script:SplitAndPrettyCategory query=select fc.bookId,  
group_concat(cr.name) as categoryName,

 from BookCat fc
 where fc.bookId = '${item.id}' AND
 group by fc.bookId
 field column=categoryType name=categoryType /
 /entity
   /entity
  /document






getting total index size last update date/time from query

2009-11-19 Thread Joel Nylund

Hi,

Looking for total number of documents in my index and the last updated  
date/time of the index.


Is there a way to get this through the standard query q=?

if not, what is the best way to get this info from solr.

thanks
Joel



Re: deployment questions

2009-11-11 Thread Joel Nylund

Anyone?

I have done more reading and testing and it seems like I want to:

Use SolrJ and embed solr in my webapp, but I want to disable the http  
access to solr, meaning force all calls through my solrj interface I  
am building (no admin access etc).


Is there a simple way to do this?

Am I better off running solr as a server on its own and using network  
security?


thanks
Joel

On Nov 9, 2009, at 5:04 PM, Joel Nylund wrote:


Hi,

I have a java app that is deployed in jboss/tomcat container. I  
would like to add my solr index to it. I have read about this and it  
seems fairly straight forward, but im curious the best way to secure  
it.


I require my users to login to my app to use it, so I want the  
search functions to behave the same way. Ideally I would like to do  
the solr queries from the client using ajax/json calls.


So given this my thinking was I should wrapper the solr servlet and  
do a local proxy type interface to ensure security. Is there any  
easier way to do this, or an example of a good way to do this? Or  
does the solr servlet support a interceptor type pattern where I  
can have it call a piece of code before I execute the call (this  
application is old and not using std j2ee security so I dont think I  
can use that.)



Another option is to do solrj on the server, and not do the client  
side calls, in this case I think I could lock down the solr servlet  
interface to only allow local calls.


thanks
Joel





indexing on differnt server

2009-11-11 Thread Joel Nylund

is it possible to index on one server and copy the files over?

thanks
Joel



deployment questions

2009-11-09 Thread Joel Nylund

Hi,

I have a java app that is deployed in jboss/tomcat container. I would  
like to add my solr index to it. I have read about this and it seems  
fairly straight forward, but im curious the best way to secure it.


I require my users to login to my app to use it, so I want the search  
functions to behave the same way. Ideally I would like to do the solr  
queries from the client using ajax/json calls.


So given this my thinking was I should wrapper the solr servlet and do  
a local proxy type interface to ensure security. Is there any easier  
way to do this, or an example of a good way to do this? Or does the  
solr servlet support a interceptor type pattern where I can have it  
call a piece of code before I execute the call (this application is  
old and not using std j2ee security so I dont think I can use that.)



Another option is to do solrj on the server, and not do the client  
side calls, in this case I think I could lock down the solr servlet  
interface to only allow local calls.


thanks
Joel



Re: solr query help alpha numeric and not

2009-11-05 Thread Joel Nylund
Hi yes its a string, in the case of a title, it can be anything, a  
letter a number, a symbol or a multibyte char etc.


Any ideas if I wanted a query that was not a letter a-z or a number  
0-9, given that its a string?


thanks
Joel

On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote:


Hi Joel,

The ID is sent back as a string (instead of as an integer) in your  
example. Could this be the cause?


- Jonathan

On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote:

Hi, I have a field called firstLetterTitle, this field has 1 char,  
it can be anything, I need help with a few queries on this char:


1.) I want all NON ALPHA and NON numbers, so any char that is not A- 
Z or 0-9


I tried:

http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z

But I get back numeric results:

doc
str name=firstLetterTitle9/str
str name=id23946447/str
/doc


2.) I want all only Numerics:

http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209

This seems to work but just checking if its the right way.



2.) I want all only English Letters:

http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z

This seems to work but just checking if its the right way.


thanks
Joel







Re: solr query help alpha numeric and not

2009-11-05 Thread Joel Nylund
Avlesh, thanks those worked, for somre reason I never got your mail,  
found it in one of the list archives though.


thanks again
Joel

On Nov 5, 2009, at 9:08 PM, Avlesh Singh wrote:


Didn't the queries in my reply work?

Cheers
Avlesh

On Fri, Nov 6, 2009 at 4:16 AM, Joel Nylund jnyl...@yahoo.com wrote:

Hi yes its a string, in the case of a title, it can be anything, a  
letter a

number, a symbol or a multibyte char etc.

Any ideas if I wanted a query that was not a letter a-z or a number  
0-9,

given that its a string?

thanks
Joel


On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote:

Hi Joel,


The ID is sent back as a string (instead of as an integer) in your
example. Could this be the cause?

- Jonathan

On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote:

Hi, I have a field called firstLetterTitle, this field has 1 char,  
it can

be anything, I need help with a few queries on this char:

1.) I want all NON ALPHA and NON numbers, so any char that is not  
A-Z or

0-9

I tried:


http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z

But I get back numeric results:

doc
str name=firstLetterTitle9/str
str name=id23946447/str
/doc


2.) I want all only Numerics:

http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209

This seems to work but just checking if its the right way.



2.) I want all only English Letters:

http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z

This seems to work but just checking if its the right way.


thanks
Joel










solr query help alpha numeric and not

2009-11-04 Thread Joel Nylund
Hi, I have a field called firstLetterTitle, this field has 1 char, it  
can be anything, I need help with a few queries on this char:


1.) I want all NON ALPHA and NON numbers, so any char that is not A-Z  
or 0-9


I tried:

http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z

But I get back numeric results:

doc
str name=firstLetterTitle9/str
str name=id23946447/str
/doc


2.) I want all only Numerics:

http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209

This seems to work but just checking if its the right way.



2.) I want all only English Letters:

http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z

This seems to work but just checking if its the right way.


thanks
Joel



how to use ajax-solr - example?

2009-11-04 Thread Joel Nylund
Hi, I looked at the documentation and I have no idea how to get  
started? Can someone point me to or show me an example of how to send  
a query to a solr server and paginate through the results using ajax- 
solr.


I would glady write a blog tutorial on how to do this if someone can  
get me started.


I dont know jquery but have used prototype  scriptaculous.

thanks
Joel



exact match lookup

2009-11-04 Thread Joel Nylund

Hi,

I have a field that I want to do exact match lookups using.
(when I say exact match, im looking for equivalent to a sql query  
where with no like clause so where feedClass = Social News)


For example the field is called feedClass and im doing:

http://localhost:8983/solr/select?q=feedClass:Blog

http://localhost:8983/solr/select?q=feedClass:Social%20News

I tried using text and it seems to work pretty well except for  
classes with spaces in them.


So I tried using field type string, that didnt work. Then I tried  
defining a new type called:


 fieldType name=text_nows class=solr.TextField  
positionIncrementGap=100

/fieldType


This didnt seem to help either.

When I do these queries for this field with spaces, I seem to get  
random results


For example:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime5/int
−
lst name=params
str name=qfeedClass:Social News/str
/lst
/lst
−
result name=response numFound=3451 start=0
−
doc
str name=feedClassBlog/str
str name=firstLetterTitleN/str
/doc


any ideas?

thanks
Joel



Re: exact match lookup

2009-11-04 Thread Joel Nylund

thank worked for me, changed to:

http://localhost:8983/solr/select?q=feedClass:%22social%20news%22

and the matches are correct, I changed the feedClass field back to  
type text.


A followup question has to do with sorting these results.

I have a field called title that I want the results sorted by.

http://localhost:8983/solr/select?q=feedClass:%22social%20news%22sort:title%20asc

I tried this and the results are not sorted (they seem random)

any ideas?

thanks
Joel


response
−
lst name=responseHeader
int name=status0/int
int name=QTime1/int
−
lst name=params
str name=qfeedClass:social news/str
str name=sort:title asc/
/lst
/lst
−
result name=response numFound=186 start=0
−
doc
str name=feedClassSocial News/str
str name=firstLetterTitleF/str
str name=titleFar/str
/doc
doc
str name=feedClassSocial News/str
str name=firstLetterTitleD/str
str name=titledig/str
/doc
doc
str name=feedClassSocial News/str
str name=firstLetterTitleT/str
str name=titleTech/str
/doc
doc
str name=feedClassSocial News/str
str name=firstLetterTitleM/str
str name=titleMix/str
/doc



On Nov 4, 2009, at 12:15 PM, Jérôme Etévé wrote:


Hi,
you need to quote your phrase when you search for 'Social News':

feedClass:Social News (URI encoded of course).

otherwise your request will become (I assume you're using a standard
query parser) feedClass:Social defaultField:News . Well that's the
idea.

It should then work using the type string.

Cheers!

J.


2009/11/4 Joel Nylund jnyl...@yahoo.com:

Hi,

I have a field that I want to do exact match lookups using.
(when I say exact match, im looking for equivalent to a sql query  
where with

no like clause so where feedClass = Social News)

For example the field is called feedClass and im doing:

http://localhost:8983/solr/select?q=feedClass:Blog

http://localhost:8983/solr/select?q=feedClass:Social%20News

I tried using text and it seems to work pretty well except for  
classes

with spaces in them.

So I tried using field type string, that didnt work. Then I tried  
defining a

new type called:

   fieldType name=text_nows class=solr.TextField
positionIncrementGap=100
  /fieldType


This didnt seem to help either.

When I do these queries for this field with spaces, I seem to get  
random

results

For example:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime5/int
−
lst name=params
str name=qfeedClass:Social News/str
/lst
/lst
−
result name=response numFound=3451 start=0
−
doc
str name=feedClassBlog/str
str name=firstLetterTitleN/str
/doc


any ideas?

thanks
Joel






--
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net




Re: exact match lookup

2009-11-04 Thread Joel Nylund

that worked, thanks!

had to negate the score.

thanks
Joel

On Nov 4, 2009, at 1:57 PM, Jérôme Etévé wrote:


If feedClass acts as an identifier, better use string :)

use sort=title asc,score desc (not sort:)

J.

2009/11/4 Joel Nylund jnyl...@yahoo.com:

thank worked for me, changed to:

http://localhost:8983/solr/select?q=feedClass:%22social%20news%22

and the matches are correct, I changed the feedClass field back to  
type

text.

A followup question has to do with sorting these results.

I have a field called title that I want the results sorted by.

http://localhost:8983/solr/select?q=feedClass:%22social%20news%22sort:title%20asc

I tried this and the results are not sorted (they seem random)

any ideas?

thanks
Joel


response
−
lst name=responseHeader
int name=status0/int
int name=QTime1/int
−
lst name=params
str name=qfeedClass:social news/str
str name=sort:title asc/
/lst
/lst
−
result name=response numFound=186 start=0
−
doc
str name=feedClassSocial News/str
str name=firstLetterTitleF/str
str name=titleFar/str
/doc
doc
str name=feedClassSocial News/str
str name=firstLetterTitleD/str
str name=titledig/str
/doc
doc
str name=feedClassSocial News/str
str name=firstLetterTitleT/str
str name=titleTech/str
/doc
doc
str name=feedClassSocial News/str
str name=firstLetterTitleM/str
str name=titleMix/str
/doc



On Nov 4, 2009, at 12:15 PM, Jérôme Etévé wrote:


Hi,
you need to quote your phrase when you search for 'Social News':

feedClass:Social News (URI encoded of course).

otherwise your request will become (I assume you're using a standard
query parser) feedClass:Social defaultField:News . Well that's the
idea.

It should then work using the type string.

Cheers!

J.


2009/11/4 Joel Nylund jnyl...@yahoo.com:


Hi,

I have a field that I want to do exact match lookups using.
(when I say exact match, im looking for equivalent to a sql query  
where

with
no like clause so where feedClass = Social News)

For example the field is called feedClass and im doing:

http://localhost:8983/solr/select?q=feedClass:Blog

http://localhost:8983/solr/select?q=feedClass:Social%20News

I tried using text and it seems to work pretty well except for  
classes

with spaces in them.

So I tried using field type string, that didnt work. Then I tried
defining a
new type called:

 fieldType name=text_nows class=solr.TextField
positionIncrementGap=100
/fieldType


This didnt seem to help either.

When I do these queries for this field with spaces, I seem to get  
random

results

For example:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime5/int
−
lst name=params
str name=qfeedClass:Social News/str
/lst
/lst
−
result name=response numFound=3451 start=0
−
doc
str name=feedClassBlog/str
str name=firstLetterTitleN/str
/doc


any ideas?

thanks
Joel






--
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net







--
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net




Re: how to use ajax-solr - example?

2009-11-04 Thread Joel Nylund

Hi Israel,

I agree the idea of adding a scripting language in between is good,  
but I want something simple I can easily test my queries with data and  
scroll through the results. I have been using the browser and getting  
xml for now, but would like to save my queries in a simple html page  
and format the data.


I figured this is something I can throw together in a few hours, but I  
also figured someone would have already done the work.


thanks
Joel

On Nov 4, 2009, at 2:02 PM, Israel Ekpo wrote:

On Wed, Nov 4, 2009 at 10:48 AM, Joel Nylund jnyl...@yahoo.com  
wrote:


Hi, I looked at the documentation and I have no idea how to get  
started?
Can someone point me to or show me an example of how to send a  
query to a

solr server and paginate through the results using ajax-solr.

I would glady write a blog tutorial on how to do this if someone  
can get me

started.

I dont know jquery but have used prototype  scriptaculous.

thanks
Joel




Joel,

It will be best if you use a scripting language between Solr and  
JavaScript


This is becasue sending data only between JavaScript and Solr will  
limit you

to only one domain name.

However, if you are using a scripting language between JavaScript  
and Solr
you can use the scripting language to retrieve the request  
parameters from
JavaScript and then same them to Solr with the response writer set  
to json.


This will cause Solr to send the response in JSON format which the  
scripting

language can pass on to JavaScript.

This example here will cause Solr to return the response in JSON.

http://example.com:8443/solr/select?q=searchkeywordwt=json


--
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.




Re: best way to model 1-N

2009-10-30 Thread Joel Nylund
thanks, but im confused how I can aggregate across rows, I dont know  
of any easy way to get my db to return one row for all the categories  
(given the hint from your other email), I have split the category  
query into a separate entity, but its returning multiple rows, how do  
I combine multiple rows into 1 index entity?


thanks
Joel

On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:



In the database this is modeled a a 1-N where category table has the
mapping of feed to category
I need to be able to query , give me all the feeds in any given  
category.

How can I best model this in solr?
Seems like multiValued field might help, but how would I populate  
it, and

would the query above work?.


Yes you are right. A multivalued field for categories is the answer.

For populating in the index -

  1. If you use DIH to populate your indexes and your datasource is a
  database then you can use DIH's RegexTransformer on an aggregated  
list of
  categories. e.g. if your database query retruns a,b,c,d in a  
column called
  db_categories, this is how you would put it in DIH's data-config  
file -

  field column=db_categories name=categories splityBy=, /.
  2. If you add documents to Solr yourself  multiple values for  
the field
  can be specified as an array or list of values in the  
SolrInputDocument.


A multivalued field provides the same faceting and searching  
capabilites

like regular fields. There is no special syntax.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com  
wrote:



Hi,

I have one index so far which contains feeds.  I have been able to
de-normalize several tables and map this data onto the feed entity.  
There is

one tricky problem that I need help on.

Feeds have 1 - many categories.

So Lets say we have Category1, Category2 and Category3

Feed 1 - is in Category 1
Feed 2 is in category2 and category3
Feed 3 is in category2
Feed 4 has no category

In the database this is modeled a a 1-N where category table has the
mapping of feed to category

I need to be able to query , give me all the feeds in any given  
category.


How can I best model this in solr?

Seems like multiValued field might help, but how would I populate  
it, and

would the query above work?.

thanks
Joel






Re: best way to model 1-N

2009-10-30 Thread Joel Nylund

Thanks Chantal, I will keep that in mind for tuning,

for sql I figured  way to combine them into one row using concat, but  
I still seem to be having an issue splitting them:


Db now returns as one column categoryType:
TOPIC,LANGUAGE

but my solr result, if you note the item in categoryType  all seem to  
be within one str, I would expect it to be in multiple strings within  
the array, is this assumption wrong?


doc
−
arr name=categoryType
strTOPIC,LANGUAGE/str
/arr
str name=id40/str
str name=titlefeed title/str
/doc


Here is my import:
  document name=doc
entity name=item
   query=SELECT f.id, f.title
FROM Feed f
field column=id name=id /
field column=title name=title /
		entity name=category query=select cfcr.feedId,  
group_concat(cfcr.categoryType) as categoryType

from CFR cfcr
where
cfcr.feedId = '${item.id}' AND
group by cfcr.feedId
				field column=categoryType name=categoryType  
splityBy=, /

/entity

 /entity

In schema:
	field name=categoryType type=text indexed=true stored=true  
required=false multiValued=true/
	field name=categoryName type=text indexed=true stored=true  
required=false multiValued=true/



what am I missing?

thanks
Joel


On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote:

That depends a bit on your database, but it is tricky and might not  
be performant.


If you are more of a Java developer, you might prefer retrieving  
mutliple rows per SOLR document from your dataSource (join on your  
category and main table), and aggregate them in your custom  
EntityProcessor. I got a far(!) better performance retrieving  
everything in one query and doing the aggregation in Java. But this  
is, of course, depending on your table structure and data.


Noble Paul helped me with the custom EntityProcessor, and it turned  
out quite easy. Have a look at the thread with the heading from this  
mailing list (SOLR-USER):
DataImportHandler / Import from DB : one data set comes in multiple  
rows


Cheers,
Chantal


Joel Nylund schrieb:

thanks, but im confused how I can aggregate across rows, I dont know
of any easy way to get my db to return one row for all the categories
(given the hint from your other email), I have split the category
query into a separate entity, but its returning multiple rows, how do
I combine multiple rows into 1 index entity?
thanks
Joel
On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:
In the database this is modeled a a 1-N where category table has  
the

mapping of feed to category
I need to be able to query , give me all the feeds in any given
category.
How can I best model this in solr?
Seems like multiValued field might help, but how would I populate
it, and
would the query above work?.

Yes you are right. A multivalued field for categories is the  
answer.


For populating in the index -

 1. If you use DIH to populate your indexes and your datasource is a
 database then you can use DIH's RegexTransformer on an aggregated
list of
 categories. e.g. if your database query retruns a,b,c,d in a
column called
 db_categories, this is how you would put it in DIH's data-config
file -
 field column=db_categories name=categories splityBy=, /.
 2. If you add documents to Solr yourself  multiple values for
the field
 can be specified as an array or list of values in the
SolrInputDocument.

A multivalued field provides the same faceting and searching
capabilites
like regular fields. There is no special syntax.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com
wrote:


Hi,

I have one index so far which contains feeds.  I have been able to
de-normalize several tables and map this data onto the feed entity.
There is
one tricky problem that I need help on.

Feeds have 1 - many categories.

So Lets say we have Category1, Category2 and Category3

Feed 1 - is in Category 1
Feed 2 is in category2 and category3
Feed 3 is in category2
Feed 4 has no category

In the database this is modeled a a 1-N where category table has  
the

mapping of feed to category

I need to be able to query , give me all the feeds in any given
category.

How can I best model this in solr?

Seems like multiValued field might help, but how would I populate
it, and
would the query above work?.

thanks
Joel






Re: best way to model 1-N

2009-10-30 Thread Joel Nylund

Im using apache-solr-1.3.0

I got it to work using javascript function instead.

thanks
Joel

On Oct 30, 2009, at 12:44 PM, Chantal Ackermann wrote:


This looks all right to me, but I might be missing something.
Which version/build of SOLR are you using?

Chantal

Joel Nylund schrieb:

Thanks Chantal, I will keep that in mind for tuning,
for sql I figured  way to combine them into one row using concat, but
I still seem to be having an issue splitting them:
Db now returns as one column categoryType:
TOPIC,LANGUAGE
but my solr result, if you note the item in categoryType  all seem to
be within one str, I would expect it to be in multiple strings within
the array, is this assumption wrong?
doc
−
arr name=categoryType
strTOPIC,LANGUAGE/str
/arr
str name=id40/str
str name=titlefeed title/str
/doc
Here is my import:
  document name=doc
entity name=item
   query=SELECT f.id, f.title
   FROM Feed f
   field column=id name=id /
field column=title name=title /
   entity name=category query=select  
cfcr.feedId,

group_concat(cfcr.categoryType) as categoryType
   from CFR cfcr
   where
   cfcr.feedId = '$ 
{item.id}' AND

   group by cfcr.feedId
   field column=categoryType  
name=categoryType

splityBy=, /
   /entity
 /entity
In schema:
   field name=categoryType type=text indexed=true  
stored=true

required=false multiValued=true/
   field name=categoryName type=text indexed=true  
stored=true

required=false multiValued=true/
what am I missing?
thanks
Joel
On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote:

That depends a bit on your database, but it is tricky and might not
be performant.

If you are more of a Java developer, you might prefer retrieving
mutliple rows per SOLR document from your dataSource (join on your
category and main table), and aggregate them in your custom
EntityProcessor. I got a far(!) better performance retrieving
everything in one query and doing the aggregation in Java. But this
is, of course, depending on your table structure and data.

Noble Paul helped me with the custom EntityProcessor, and it turned
out quite easy. Have a look at the thread with the heading from this
mailing list (SOLR-USER):
DataImportHandler / Import from DB : one data set comes in multiple
rows

Cheers,
Chantal


Joel Nylund schrieb:
thanks, but im confused how I can aggregate across rows, I dont  
know
of any easy way to get my db to return one row for all the  
categories

(given the hint from your other email), I have split the category
query into a separate entity, but its returning multiple rows,  
how do

I combine multiple rows into 1 index entity?
thanks
Joel
On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:

In the database this is modeled a a 1-N where category table has
the
mapping of feed to category
I need to be able to query , give me all the feeds in any given
category.
How can I best model this in solr?
Seems like multiValued field might help, but how would I populate
it, and
would the query above work?.


Yes you are right. A multivalued field for categories is the
answer.

For populating in the index -

1. If you use DIH to populate your indexes and your datasource  
is a

database then you can use DIH's RegexTransformer on an aggregated
list of
categories. e.g. if your database query retruns a,b,c,d in a
column called
db_categories, this is how you would put it in DIH's data-config
file -
field column=db_categories name=categories splityBy=, /.
2. If you add documents to Solr yourself  multiple values for
the field
can be specified as an array or list of values in the
SolrInputDocument.

A multivalued field provides the same faceting and searching
capabilites
like regular fields. There is no special syntax.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com
wrote:


Hi,

I have one index so far which contains feeds.  I have been able  
to
de-normalize several tables and map this data onto the feed  
entity.

There is
one tricky problem that I need help on.

Feeds have 1 - many categories.

So Lets say we have Category1, Category2 and Category3

Feed 1 - is in Category 1
Feed 2 is in category2 and category3
Feed 3 is in category2
Feed 4 has no category

In the database this is modeled a a 1-N where category table has
the
mapping of feed to category

I need to be able to query , give me all the feeds in any given
category.

How can I best model this in solr?

Seems like multiValued field might help, but how would I populate
it, and
would the query above work?.

thanks
Joel






Re: weird problem with letters S and T

2009-10-29 Thread Joel Nylund
Hey everyone thanks for the help, it seems to be working this am after  
a restart  reindex (maybe I was just too sleepy last night), and  
using field type of text_ws.


Im curios about the pro's and cons of Michel's approach below, this  
seems like another good way to do it, is there any difference in terms  
of performance and/or index size or anything else I  need to worry  
about. My index will have about 3million records in prod, im testing  
with 300k (1/10 scale) now and it seems fine.


thanks
Joel

On Oct 29, 2009, at 8:09 AM, Michel Bottan wrote:


Hi Joel,

If you intend querying for the TITLE which starts with specifics  
letters, I

have another solution which seems to be easier, since you don't need a
specific field for the first letter.

1. Create a new type in your schema.xml using the following analyzer

   fieldType name=text_sort class=solr.TextField
positionIncrementGap=100
 analyzer
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.ISOLatin1AccentFilterFactory/
   filter class=solr.TrimFilterFactory/
   filter class=solr.PatternReplaceFilterFactory
pattern=([^a-zA-Z0-9]) replacement= replace=all/
 /analyzer
/fieldType

2. Create a copy field from its original

   field name=title_sorttype=text_sort indexed=true
stored=false/

copyField source=title   dest=title_sort/

3. Use Filter Quey to filter

i.e. fq=title_sort:[a TO b]s=title_sort asc (títulos começando em  
A até N)



4. Read field value for presentation from the original field

Cheers!
Michel Bottan

On Thu, Oct 29, 2009 at 1:23 AM, Norberto Meijome  
numard...@gmail.comwrote:



On Wed, 28 Oct 2009 19:20:37 -0400
Joel Nylund jnyl...@yahoo.com wrote:


Well I tried removing those 2 letters from stopwords, didnt seem to
help, I also tried changing the field type to text_ws, didnt  
seem to

work. Any other ideas?



Hi Joel,
if your stop word filter was applied on index, you will have to  
reindex

again (at least those documents with S and T).

If your stop filter was *only* on query, then it should work after  
you

reloaded your app.

b

_
{Beto|Norberto|Numard} Meijome

Those who do not remember the past are condemned to repeat it.
 George Santayana

I speak for myself, not my employer. Contents may be hot. Slippery  
when
wet. Reading disclaimers makes you go blind. Writing them is worse.  
You have

been Warned.





data import with transformer

2009-10-29 Thread Joel Nylund
Hi, I have been reading the solr book and wiki, but I cant find any  
similar examples to what Im looking for.


I have a database field called category, this field needs some text  
manipulation before it goes in the index


here is the java code for what im trying to do:

// categories look like this prefix category suffix
// I want to turn them into category remove prefix and suffix and  
spaces before and after

 public static String getPrettyCategoryName(String categoryName)
{
String result;

if (categoryName == null || categoryName.equals())
{
// nothing to do; just return what was passed in.
result = categoryName;
}
else
{
result = categoryName.toLowerCase();

if (result.startsWith(startString))
{
result = result.substring(startString.length());
}

if (result.endsWith(endString))
{
result = result.substring(0, (result.length() -  
endString

.length()));
}

if (result.length()  0)
{
result = Character.toUpperCase(result.charAt(0))
+ result.substring(1);
}
}

return result;
}


Can I have a transformer call a java method?

It seems like I can, but how do I transform must one column. If  
someone can point me to a complete example that transforms a column  
using java or javascript im sure I can figure this out



thanks
Joel



multiple sql queries for one index?

2009-10-29 Thread Joel Nylund
Hi, Its been hurting my brain all day to try to build 1 query for my  
index (joins upon joins upon joins). Is there a way I can do multiple  
queries to populate the same index? I have one main table that I can  
join everything back via ID, it should be theoretically possible


If this can be done, can someone point me to an example?

thanks
Joel



best way to model 1-N

2009-10-29 Thread Joel Nylund

Hi,

I have one index so far which contains feeds.  I have been able to de- 
normalize several tables and map this data onto the feed entity. There  
is one tricky problem that I need help on.


Feeds have 1 - many categories.

So Lets say we have Category1, Category2 and Category3

Feed 1 - is in Category 1
Feed 2 is in category2 and category3
Feed 3 is in category2
Feed 4 has no category

In the database this is modeled a a 1-N where category table has the  
mapping of feed to category


I need to be able to query , give me all the feeds in any given  
category.


How can I best model this in solr?

Seems like multiValued field might help, but how would I populate it,  
and would the query above work?.


thanks
Joel



weird problem with letters S and T

2009-10-28 Thread Joel Nylund

(I am super new to solr, sorry if this is an easy one)

Hi, I want to support an A-Z type view of my data.

I have a DataImportHandler that uses sql (my query is complex, but the  
part that matters is:


SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f

I can create this index with no issues.

I can query the title with no problem:

http://localhost:8983/solr/select?q=title:super

I can query the first letters mostly with no problem:

http://localhost:8983/solr/select?q=firstLetterTitle:a

Returns all the foo's with the first letter a.

This actually works with every letter except S and T

If I query those, I get no results. The weird thing if I do the title  
query above with Super I get lots of results, and the xml shoes the  
firstLetterTitles for those to be S


doc
str name=firstLetterTitleS/str
str name=id84861348/str
str name=titleSuper Cool/str
/doc
−
doc
str name=firstLetterTitleS/str
str name=id108692/str
str name=titleSuper 45/str
/doc
−
doc

etc.

Any ideas, are S and T special chars in query for solr?

here is the response from the s query with debug = true

response
−
lst name=responseHeader
int name=status0/int
int name=QTime24/int
−
lst name=params
str name=qfirstLetterTitle:s/str
str name=debugQuerytrue/str
/lst
/lst
result name=response numFound=0 start=0/
−
lst name=debug
str name=rawquerystringfirstLetterTitle:s/str
str name=querystringfirstLetterTitle:s/str
str name=parsedquery/
str name=parsedquery_toString/
lst name=explain/
str name=QParserOldLuceneQParser/str
−
lst name=timing
double name=time2.0/double
−
lst name=prepare
double name=time1.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
−
lst name=process
double name=time0.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response



thanks
Joel



Re: weird problem with letters S and T

2009-10-28 Thread Joel Nylund
Thanks Bern, now that you mention it they are in there, I assume if I  
remove them it will work, but I probably dont want to do that right?


Is there a way for this particular query to ignore stopwords

thanks
Joel

On Oct 28, 2009, at 6:20 PM, Bernadette Houghton wrote:

Hi Joel, I had a similar issue the other day; in my case the  
solution turned out to be that the letters were stopwords. Don't  
know if this is your answer, but worth checking.

Bern

-Original Message-
From: Joel Nylund [mailto:jnyl...@yahoo.com]
Sent: Thursday, 29 October 2009 9:17 AM
To: solr-user@lucene.apache.org
Subject: weird problem with letters S and T

(I am super new to solr, sorry if this is an easy one)

Hi, I want to support an A-Z type view of my data.

I have a DataImportHandler that uses sql (my query is complex, but the
part that matters is:

SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f

I can create this index with no issues.

I can query the title with no problem:

http://localhost:8983/solr/select?q=title:super

I can query the first letters mostly with no problem:

http://localhost:8983/solr/select?q=firstLetterTitle:a

Returns all the foo's with the first letter a.

This actually works with every letter except S and T

If I query those, I get no results. The weird thing if I do the title
query above with Super I get lots of results, and the xml shoes the
firstLetterTitles for those to be S

doc
str name=firstLetterTitleS/str
str name=id84861348/str
str name=titleSuper Cool/str
/doc
−
doc
str name=firstLetterTitleS/str
str name=id108692/str
str name=titleSuper 45/str
/doc
−
doc

etc.

Any ideas, are S and T special chars in query for solr?

here is the response from the s query with debug = true

response
−
lst name=responseHeader
int name=status0/int
int name=QTime24/int
−
lst name=params
str name=qfirstLetterTitle:s/str
str name=debugQuerytrue/str
/lst
/lst
result name=response numFound=0 start=0/
−
lst name=debug
str name=rawquerystringfirstLetterTitle:s/str
str name=querystringfirstLetterTitle:s/str
str name=parsedquery/
str name=parsedquery_toString/
lst name=explain/
str name=QParserOldLuceneQParser/str
−
lst name=timing
double name=time2.0/double
−
lst name=prepare
double name=time1.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
−
lst name=process
double name=time0.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response



thanks
Joel





Re: weird problem with letters S and T

2009-10-28 Thread Joel Nylund
Well I tried removing those 2 letters from stopwords, didnt seem to  
help, I also tried changing the field type to text_ws, didnt seem to  
work. Any other ideas?


thanks
Joel

On Oct 28, 2009, at 6:42 PM, Martijn v Groningen wrote:


I think that is not a problem, because your are only storing one
character per field. There are other text field types that do not have
the stop word filter, so give your first letter field that field type.
In this way stopword filter analyser is only disabled for searches on
the first letter field.

Cheers,

Martijn

2009/10/28 Joel Nylund jnyl...@yahoo.com:
Thanks Bern, now that you mention it they are in there, I assume if  
I remove

them it will work, but I probably dont want to do that right?

Is there a way for this particular query to ignore stopwords

thanks
Joel

On Oct 28, 2009, at 6:20 PM, Bernadette Houghton wrote:

Hi Joel, I had a similar issue the other day; in my case the  
solution
turned out to be that the letters were stopwords. Don't know if  
this is your

answer, but worth checking.
Bern

-Original Message-
From: Joel Nylund [mailto:jnyl...@yahoo.com]
Sent: Thursday, 29 October 2009 9:17 AM
To: solr-user@lucene.apache.org
Subject: weird problem with letters S and T

(I am super new to solr, sorry if this is an easy one)

Hi, I want to support an A-Z type view of my data.

I have a DataImportHandler that uses sql (my query is complex, but  
the

part that matters is:

SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f

I can create this index with no issues.

I can query the title with no problem:

http://localhost:8983/solr/select?q=title:super

I can query the first letters mostly with no problem:

http://localhost:8983/solr/select?q=firstLetterTitle:a

Returns all the foo's with the first letter a.

This actually works with every letter except S and T

If I query those, I get no results. The weird thing if I do the  
title
query above with Super I get lots of results, and the xml shoes  
the

firstLetterTitles for those to be S

doc
str name=firstLetterTitleS/str
str name=id84861348/str
str name=titleSuper Cool/str
/doc
−
doc
str name=firstLetterTitleS/str
str name=id108692/str
str name=titleSuper 45/str
/doc
−
doc

etc.

Any ideas, are S and T special chars in query for solr?

here is the response from the s query with debug = true

response
−
lst name=responseHeader
int name=status0/int
int name=QTime24/int
−
lst name=params
str name=qfirstLetterTitle:s/str
str name=debugQuerytrue/str
/lst
/lst
result name=response numFound=0 start=0/
−
lst name=debug
str name=rawquerystringfirstLetterTitle:s/str
str name=querystringfirstLetterTitle:s/str
str name=parsedquery/
str name=parsedquery_toString/
lst name=explain/
str name=QParserOldLuceneQParser/str
−
lst name=timing
double name=time2.0/double
−
lst name=prepare
double name=time1.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
−
lst name=process
double name=time0.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response



thanks
Joel