Re: Facet filter: how to specify OR expression?

2011-05-12 Thread Grijesh
How about fq=docType:(pdf OR txt)

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930648.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: K-Stemmer for Solr 3.1

2011-05-12 Thread Bernd Fehling


Am 12.05.2011 02:05, schrieb Mark:

It appears that the older version of the Lucid Works KStemmer is incompatible 
with Solr 3.1. Has anyone been able to get this to work? If not,
what are you using as an alternative?

Thanks


Lucid KStemmer works nice with Solr3.1 after some minor mods to
KStemFilter.java and KStemFilterFactory.java.
What problems do you have?

Bernd


Applying SOLR-236 field collapse patch to Solr 3.1.0

2011-05-12 Thread karan singh

I've been trying to install the field collapse patch to solr 3.1.0 using the 
following link 
:br/br/https://issues.apache.org/jira/browse/SOLR-236br/However, I'm not 
entirely sure which patch to download. How do I decide on that?br/Also, as I 
understand it, I have to cd into the apache-solr-3.1.0 directory and then 
execute patch -p1  /pathtopatchfile.Is that correct?   
  

Re: Indexing Mails

2011-05-12 Thread Chandan Tamrakar
what kind of emails you want to parse ?  MS emails ?

You could integrate apache tika  but it depends on what kind of emails Tika
parser would be able to parse

You can define the fields that could be parsed and define that in your xml
schema

thanks

On Tue, May 10, 2011 at 2:07 PM, Jörg Agatz joerg.ag...@googlemail.comwrote:

 will the E-Mail ID, and the recent E-Mail Ids, indext too?

 and witch fiels i have to create in schema.xml?




-- 
Chandan Tamrakar
*
*


Re: Is it possible to build Solr as a maven project?

2011-05-12 Thread Gabriele Kahlout
On Tue, May 10, 2011 at 3:56 PM, Gabriele Kahlout
gabri...@mysimpatico.comwrote:



 On Tue, May 10, 2011 at 3:50 PM, Steven A Rowe sar...@syr.edu wrote:

 Hi Gabriele,

 There are some Maven instructions here (not in Lucene/Solr 3.1 because I
 just wrote the file a couple of days ago):
 
 http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/dev-tools/maven/README.maven
 

 My recommendation, since the Solr 3.1 source tarball does not include
 dev-tools/, is to check out the 3.1-tagged sources from Subversion:

 svn co http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1

 and then follow the instructions in the above-linked README.maven.  I did
 that just now and it worked for me.  The results are in solr/package/maven/.


 I did that and i think they worked for me but i didn't get nutch to work
 with it, so I preferred to revert to what is officially supported (not even,
 but...).

 I'll be trying and report back.


Everything worked! Those the revisions used:

$ svn co -r 1101526
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1 solr 1086822
$ svn co -r 1101540
http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 nutch



 Thank you






 Please write back if you run into any problems.

 Steve


 From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
 Sent: Tuesday, May 10, 2011 8:37 AM
 To: boutr...@gmail.com
 Cc: solr-user@lucene.apache.org; Steven A Rowe; ryan...@gmail.com
 Subject: Re: Is it possible to build Solr as a maven project?


 sorry, this was not the target I used (this one should work too, but...),

 Can we expand on the but...?

 $ wget http://apache.panu.it//lucene/solr/3.1.0/apache-solr-3.1.0-src.tgz
 http://apache.panu.it/lucene/solr/3.1.0/apache-solr-3.1.0-src.tgz
 $ tar xf apache-solr-3.1.0-src.tgz
 $ cd apache-solr-3.1.0
 $ ant generate-maven-artifacts
 generate-maven-artifacts:

 get-maven-poms:

 BUILD FAILED
 /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:59: The following
 error occurred while executing this line:
 /Users/simpatico/Downloads/apache-solr-3.1.0/lucene/build.xml:445: The
 following error occurred while executing this line:
 /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:45:
 /Users/simpatico/Downloads/apache-solr-3.1.0/dev-tools/maven does not exist.



 Now for those that build this, it must have worked sometime. How? Or is
 this a bug in the release?
 Looking the revisions history of the build script I might be referring to
 LUCENE-2490https://issues.apache.org/jira/browse/LUCENE-2490 but I'm
 not sure I understand the solution out. I've checked out dev-tools but even
 with it things don't work (tried the one with 3.1.0 relesase).




 the one I used is get-maven-poms. That will just create pom files and copy
 them to their right target locations.

 I'm using netbeans and I'm using the plugin Automatic Projects to do
 everything inside the IDE.

 Which version of Solr are you using ?

 Ludovic.

 2011/5/4 Gabriele Kahlout [via Lucene] 
 ml-node+2898211-2124746009-383...@n3.nabble.commailto:
 ml-node%2b2898211-2124746009-383...@n3.nabble.com

  generate-maven-artifacts:
 [mkdir] Created dir: /Users/simpatico/SOLR_HOME/build/maven
 [mkdir] Created dir: /Users/simpatico/SOLR_HOME/dist/maven
  [copy] Copying 1 file to
  /Users/simpatico/SOLR_HOME/build/maven/src/maven
  [artifact:install-provider] Installing provider:
  org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2
 
  *BUILD FAILED*
  /Users/simpatico/SOLR_HOME/*build.xml:800*: The following error occurred
  while executing this line:
  /Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't
  support the uniqueVersion attribute
 
 
  *build.xml:800: *m2-deploy
  pom.xml=src/maven/solr-parent-pom.xml.template/
 
  removed uniquVersion attirubte:
 
  generate-maven-artifacts:
  [artifact:install-provider] Installing provider:
  org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2
  [artifact:deploy] Deploying to
 file:///Users/simpatico/SOLR_HOME/dist/mavenfile:///\\Users\simpatico\SOLR_HOME\dist\maven
 
  [artifact:deploy] [INFO] Retrieving previous build number from remote
  [artifact:deploy] [INFO] Retrieving previous metadata from remote
  [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact
  org.apache.solr:solr-parent'
  [artifact:deploy] [INFO] Retrieving previous metadata from remote
  [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot
  org.apache.solr:solr-parent:1.4.2-SNAPSHOT'
   [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/lib
  [artifact:install-provider] Installing provider:
  org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2
  [artifact:deploy] Deploying to
 file:///Users/simpatico/SOLR_HOME/dist/mavenfile:///\\Users\simpatico\SOLR_HOME\dist\maven
 
  [artifact:deploy] [INFO] Retrieving previous build number from remote
  [artifact:deploy] [INFO] Retrieving previous metadata from remote
  [artifact:deploy] [INFO] Uploading repository metadata for: 

Re: Facet filter: how to specify OR expression?

2011-05-12 Thread cnyee
It works. Many thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930783.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet filter: how to specify OR expression?

2011-05-12 Thread cnyee
I have another facet that is of type integer and it gave an exception.

Is it true that the field has to be of type string or text for the OR
expression to work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930863.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet Count Based on Dates

2011-05-12 Thread Jasneet Sabharwal
But Pivot Faceting is a feature of Solr 4.0 and I am using 3.1 as that 
is a stable built and cant use a a Nightly Build.


The question was: -

I have a schema which has field Polarity which is of type text and it 
can have three values 0,1 or -1 and CreatedAt which is of type date.


*How can I get count of polarity based on dates. For example, it gives 
the output that on 5/1/2011 there were 10 counts of 0, 10 counts of 1 
and 10 counts of -1

*

If I use the facet query like this :-

http://localhost:8983/solor/select/?q=*:*facet=truefacet.field=Polarity

Then I get the count of the complete database

lstname=Polarity
intname=0531477/int
intname=1530682/int
/lst

The query : 
http://localhost:8983/solr/select/?q=*:*%20AND%20CreatedAt:[2011-03-10T00:00:00Z%20TO%202011-03-18T23:59:59Z]facet=truefacet.date=CreatedAtfacet.date.start=2011-03-10T00:00:00Zfacet.date.end=2011-03-18T23:59:59Zfacet.date.gap=%2B1DAY 
http://ec2-50-16-100-114.compute-1.amazonaws.com:8983/solr/select/?q=TweetData:*%20AND%20CreatedAt:[2011-03-10T00:00:00Z%20TO%202011-03-18T23:59:59Z]facet=truefacet.date=CreatedAtfacet.date.start=2011-03-10T00:00:00Zfacet.date.end=2011-03-18T23:59:59Zfacet.date.gap=%2B1DAY


Would give me the count of data per day, like this:

lstname=CreatedAt
intname=2011-03-10T00:00:00Z0/int
intname=2011-03-11T00:00:00Z276262/int
intname=2011-03-12T00:00:00Z183929/int
intname=2011-03-13T00:00:00Z196853/int
intname=2011-03-14T00:00:00Z2967/int
intname=2011-03-15T00:00:00Z22762/int
intname=2011-03-16T00:00:00Z11299/int
intname=2011-03-17T00:00:00Z37433/int
intname=2011-03-18T00:00:00Z14359/int
strname=gap+1DAY/str
datename=start2011-03-10T00:00:00Z/date
datename=end2011-03-19T00:00:00Z/date
/lst

How will I be able to get the Polarity count for each date like:-

2011-03-10T00:00:00Z
Polarity
0 = 100
1 = 500
-1 = 200
2011-03-11T00:00:00Z
Polarity
0=100
1=500
-1=200

And so on till the date range ends.


On 10-05-2011 15:51, Grijesh wrote:

Have you looked at Pivot Faceting
http://wiki.apache.org/solr/HierarchicalFaceting
http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting-1

-
Thanx:
Grijesh
www.gettinhahead.co.in
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2922541.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582



Re: Facet filter: how to specify OR expression?

2011-05-12 Thread Grijesh
No, OR operator should work for any data type

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet Count Based on Dates

2011-05-12 Thread Grijesh
You can apply patch for Hierarchical faceting on Solr 3.1 

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2930924.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spatial search - SOLR 3.1

2011-05-12 Thread roySolr
Hello David,

It's easy to calculate it by myself but it was nice if SOLR returns distance
in the response. I can sort
on distance and calculate the distance with PHP to show it to the users.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-search-SOLR-3-1-tp2927579p2930926.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet filter: how to specify OR expression?

2011-05-12 Thread cnyee
The exception says:

java.lang.NumberFormatExcepton: for input string or

The field type is:
fieldType name=tint class=solr.TrieIntField precisionStep=8
omitNorms=true positionIncrementGap=0/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931282.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet filter: how to specify OR expression?

2011-05-12 Thread rajini maski
The input parameter assigning to the field tint is type string (or).  It
is trying to assign tint=or which is incorrect. So the respective exception
has occurred.

On Thu, May 12, 2011 at 4:10 PM, cnyee yeec...@gmail.com wrote:

 The exception says:

 java.lang.NumberFormatExcepton: for input string or

 The field type is:
 fieldType name=tint class=solr.TrieIntField precisionStep=8
 omitNorms=true positionIncrementGap=0/



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931282.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Document match with no highlight

2011-05-12 Thread Phong Dais
Hi,

Ok, here it is.  Please note that I had to type everything.  I did double
and triple check for typos.
I do not use term vectors.  I also left out the timing section.

Thanks for all the help.
P.

URL:
http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1

XML:
?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime19/int
lst name=params
  str name=explainOther/
  str name=indenton/str
  str name=hl.flDOC_TEXT/str
  str name=wtstandard/str
  str name=hl.maxAnalyzedChars-1/str
  str name=hlon/str
  str name=rows10/str
  str name=version2.2/str
  str name=debugQueryon/str
  str name=flDOC_TEXT,score/str
  str name=start0/str
  str name=qDOC_TEXT:3 1 15/str
  str name=qtstandard/str
  str name=fq/
/lst
  /lst
  result name=response numFound='1 start=0 maxScore=0.035959315
doc
  float name=score0.035959315/float
  arr name=DOC_TEXTstr ... /str/arr
doc
  /result
  lst name=highlighting
lst name=123456/
  /lst
  lst name=debug
str name=rawquerystringDOC_TEXT:3 1 15/str
str name=querystringDOC_TEXT:3 1 15/str
str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str
str name=parsedquery_toStringDOC_TEXT:3 1 15/str
lst name=explain
  str name=123456
0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
tf(phraseFreq=1.0)
0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1)
0.0390625 = fieldNorm(field=DOC_TEXT, doc=0)
/str
  /lst
  str name=QParserLuceneQParser/str
  arr name=filter_queries
str/
  /arr
  arr name=parsed_filter_queries/
  lst name=timing
...
  /lst
/response


On Wed, May 11, 2011 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I can upload the search URL and part of the output but not
  all of it.
   Company trade secrets does not allow me to upload the
  content of the
  DOC_TEXT field.  I can upload the debug output
  section and whatever else
  is needed but I cannot upload the actual document data.
 
  Please let me know if any of this will help without the
  actual data.

 Sure they will help. Seeing complete list of parameters.
 Do you store term vectors?



Spellcheck: Two dictionaries

2011-05-12 Thread roySolr
Hello,

I have 2 fields: what and where. For both of the field i want some
spellchecking. I have 2
dictionary's in my config:

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypews/str

lst name=spellchecker
str name=namewhat/str
str name=fieldwhat/str
str name=spellcheckIndexDirspellchecker_what/str
/lst
lst name=spellchecker
str name=namewhere/str
str name=fieldwhere/str
str name=spellcheckIndexDirspellchecker_where/str
/lst
/searchComponent

I can search on dictionary with spellcheck.dictionary=what in my url. How
can i set 
some spellchecking for both fields?? I see that SOLR 3.1 has 
spellcheck.DICT_NAME.key parameter. How can i use that in my url?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-Two-dictionaries-tp2931458p2931458.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to update database record after indexing

2011-05-12 Thread vrpar...@gmail.com
actually every hour some records are inserted into database, so every hour
solr indexing will be called with delta import,

notes: records and data are very large (in GBs)

so each time to find all solr index and update database records process will
be slow.

is there any eventlistners or snapshooter can help me to solve this problem
?


Thanks,

Vishal Parekh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p2931537.html
Sent from the Solr - User mailing list archive at Nabble.com.


Coord in queryExplain

2011-05-12 Thread Gabriele Kahlout
Hello,

I'm wondering why the results of coord() are not displayed when debugging
query results, as described in the
wiki[1http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22].
I'd like to see it.
Could someone point to how to make it appear with the debug fields?

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Coord in queryExplain

2011-05-12 Thread Ahmet Arslan
 I'm wondering why the results of coord() are not displayed
 when debugging
 query results, as described in the
 wiki[1http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22].
 I'd like to see it.
 Could someone point to how to make it appear with the debug
 fields?

coord info displayed, however it seems that it is not displayed for value of 
1.0 .
To see coord, issue a multi-word query, and advance to the end of the list via 
start param.


Re: Coord in queryExplain

2011-05-12 Thread Gabriele Kahlout
You are right!

On Thu, May 12, 2011 at 2:54 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I'm wondering why the results of coord() are not displayed
  when debugging
  query results, as described in the
  wiki[1
 http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22
 ].
  I'd like to see it.
  Could someone point to how to make it appear with the debug
  fields?

 coord info displayed, however it seems that it is not displayed for value
 of 1.0 .
 To see coord, issue a multi-word query, and advance to the end of the list
 via start param.




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Document match with no highlight

2011-05-12 Thread Ahmet Arslan
 URL:
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
 XML:
 ?xml version=1.0 encoding=UTF-8?
 response
   lst name=responseHeader
     int name=status0/int
     int name=QTime19/int
     lst name=params
       str name=explainOther/
       str
 name=indenton/str
       str
 name=hl.flDOC_TEXT/str
       str
 name=wtstandard/str
       str
 name=hl.maxAnalyzedChars-1/str
       str name=hlon/str
       str name=rows10/str
       str
 name=version2.2/str
       str
 name=debugQueryon/str
       str
 name=flDOC_TEXT,score/str
       str name=start0/str
       str name=qDOC_TEXT:3 1
 15/str
       str
 name=qtstandard/str
       str name=fq/
     /lst
   /lst
   result name=response numFound='1 start=0
 maxScore=0.035959315
     doc
       float
 name=score0.035959315/float
       arr name=DOC_TEXTstr
 ... /str/arr
     doc
   /result
   lst name=highlighting
     lst name=123456/
   /lst
   lst name=debug
     str name=rawquerystringDOC_TEXT:3
 1 15/str
     str name=querystringDOC_TEXT:3 1
 15/str
     str
 name=parsedqueryPhraseQuery(DOC_TEXT:3 1
 15)/str
     str
 name=parsedquery_toStringDOC_TEXT:3 1
 15/str
     lst name=explain
       str name=123456
         0.035959315 =
 fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
 tf(phraseFreq=1.0)
         0.92055845 = idf(DOC_TEXT: 3=1
 1=1 15=1)
         0.0390625 =
 fieldNorm(field=DOC_TEXT, doc=0)
     /str
   /lst
   str name=QParserLuceneQParser/str
   arr name=filter_queries
     str/
   /arr
   arr name=parsed_filter_queries/
   lst name=timing
     ...
   /lst
 /response


Nothing looks suspicious. 

Can you provide two things more;
fieldType of DOC_TEXT
and
field definition of DOC_TEXT.

Also do you get snippet from the same doc, when you remove quotes from your 
query?



Re: Facet Count Based on Dates

2011-05-12 Thread Jasneet Sabharwal

Is it possible to use the features of 3.1 by default for my query ?
On 12-05-2011 13:38, Grijesh wrote:

You can apply patch for Hierarchical faceting on Solr 3.1

-
Thanx:
Grijesh
www.gettinhahead.co.in
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2930924.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582



Re: Facet Count Based on Dates

2011-05-12 Thread Jasneet Sabharwal

Or is it possible to use a Group By query in Solr 3.1 like we do in SQL ?
On 12-05-2011 19:37, Jasneet Sabharwal wrote:

Is it possible to use the features of 3.1 by default for my query ?
On 12-05-2011 13:38, Grijesh wrote:

You can apply patch for Hierarchical faceting on Solr 3.1

-
Thanx:
Grijesh
www.gettinhahead.co.in
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2930924.html

Sent from the Solr - User mailing list archive at Nabble.com.







--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582



Re: K-Stemmer for Solr 3.1

2011-05-12 Thread Mark
java.lang.AbstractMethodError: 
org.apache.lucene.analysis.TokenStream.incrementToken()Z


Would you mind explaining your modifications? Thanks

On 5/11/11 11:14 PM, Bernd Fehling wrote:


Am 12.05.2011 02:05, schrieb Mark:
It appears that the older version of the Lucid Works KStemmer is 
incompatible with Solr 3.1. Has anyone been able to get this to work? 
If not,

what are you using as an alternative?

Thanks


Lucid KStemmer works nice with Solr3.1 after some minor mods to
KStemFilter.java and KStemFilterFactory.java.
What problems do you have?

Bernd


MoreLikeThis PDF search

2011-05-12 Thread Brian Lamb
Hi all,

I've become more and more familiar with the MoreLikeThis handler over the
last several months. I'm curious whether it is possible to do a MoreLikeThis
search by uploading a PDF? I looked at the ExtractingRequestHandler and that
looks like it that is used to process PDF files and the like but is it
possible to combine the two?

Just to be clear, I don't want to send a PDF and have that be a part of the
index. But rather, I'd like to be able to use the PDF as a MoreLikeThis
search.

Thanks,

Brian Lamb


RE: Document match with no highlight

2011-05-12 Thread Bob Sandiford
Don't you need to include your unique id field in your 'fl' parameter?  It will 
be needed anyways so you can match up the highlight fragments with the result 
docs once highlighting is working...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
Join the conversation - you may even get an iPad or Nook out of it!

Like us on Facebook!

Follow us on Twitter!



 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com]
 Sent: Thursday, May 12, 2011 7:10 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Document match with no highlight
 
  URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%2
 23+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexpl
 ainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
    lst name=responseHeader
      int name=status0/int
      int name=QTime19/int
      lst name=params
        str name=explainOther/
        str
  name=indenton/str
        str
  name=hl.flDOC_TEXT/str
        str
  name=wtstandard/str
        str
  name=hl.maxAnalyzedChars-1/str
        str name=hlon/str
        str name=rows10/str
        str
  name=version2.2/str
        str
  name=debugQueryon/str
        str
  name=flDOC_TEXT,score/str
        str name=start0/str
        str name=qDOC_TEXT:3 1
  15/str
        str
  name=qtstandard/str
        str name=fq/
      /lst
    /lst
    result name=response numFound='1 start=0
  maxScore=0.035959315
      doc
        float
  name=score0.035959315/float
        arr name=DOC_TEXTstr
  ... /str/arr
      doc
    /result
    lst name=highlighting
      lst name=123456/
    /lst
    lst name=debug
      str name=rawquerystringDOC_TEXT:3
  1 15/str
      str name=querystringDOC_TEXT:3 1
  15/str
      str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
      str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
      lst name=explain
        str name=123456
          0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
          0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
          0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
      /str
    /lst
    str name=QParserLuceneQParser/str
    arr name=filter_queries
      str/
    /arr
    arr name=parsed_filter_queries/
    lst name=timing
      ...
    /lst
  /response
 
 
 Nothing looks suspicious.
 
 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.
 
 Also do you get snippet from the same doc, when you remove quotes from
 your query?
 




TrieIntField for short values

2011-05-12 Thread Juan Antonio Farré Basurte
Hello,
I'm quite a beginner in solr and have many doubts while trying to learn how 
everything works.
I have only a slight idea on how TrieFields work.
The thing is I have an integer value that will always be in the range 0-1000. A 
short field would be enough for this, but there is no such TrieShortField (not 
even a SortableShortField). So, I used a TrieIntField.
My doubt is, in this case, what would be a suitable value for precisionStep. If 
the field had only 1000 distinct values, but they were more or less uniformly 
distributed in the 32-bit int range, probably a big precisionStep would be 
suitable. But as my values are in the range 0 to 1000, I think (without much 
knowledge) that a low precisionStep should be more adequate. For example, 2.
Can anybody, please, help me finding a good configuration for this type? And, 
if possible, can anybody explain in a brief and intuitive way what are the 
differences and tradeoffs of choosing smaller or bigger precisionSteps?
Thanks a lot,

Juan

Re: Result docs missing only when shards parameter present in query?

2011-05-12 Thread mrw

Does this seem like it would be a configuration issue, an indexed data
issue, or something else?

Thanks


mrw wrote:
 
 We have two Solr nodes, each with multiple shards.  If we query each shard
 directly (no shards parameter), we get the expected results:
 
 response
lst name=responseHeader
int name=status 0
int name=QTime  22
result name=response numFound=100 start=0
 doc
 doc
   
 (^^^ hand-typed pseudo XML)
 
 However, if we add the shards parameter and even supply one of the above
 shards, we get the same number of results, but all the doc elements under
 the result element are missing:
 
 response
lst name=responseHeader
int name=status 0
int name=QTime  33
result name=response numFound=100 start=0

 
 (^^^ note missing doc elements)
 
 It doesn't matter which shard is specified in the shards parameter;  if
 any or all of the shards are specified after the shards parameter, we see
 this behavior.
 
 When we go to http://server:8983/solr/  on either node, we see all the
 shards properly listed.  
 
 So, the shards seem to be registered properly, and work individually, but
 not when the shards parameter is supplied.   Any ideas?
 
 
 Thanks!
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-docs-missing-only-when-shards-parameter-present-in-query-tp2928889p2932248.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Document match with no highlight

2011-05-12 Thread Phong Dais
Hi,

field name=DOC_TEXT type=text indexed=true stored=true/

The type text is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

   URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
lst name=responseHeader
  int name=status0/int
  int name=QTime19/int
  lst name=params
str name=explainOther/
str
  name=indenton/str
str
  name=hl.flDOC_TEXT/str
str
  name=wtstandard/str
str
  name=hl.maxAnalyzedChars-1/str
str name=hlon/str
str name=rows10/str
str
  name=version2.2/str
str
  name=debugQueryon/str
str
  name=flDOC_TEXT,score/str
str name=start0/str
str name=qDOC_TEXT:3 1
  15/str
str
  name=qtstandard/str
str name=fq/
  /lst
/lst
result name=response numFound='1 start=0
  maxScore=0.035959315
  doc
float
  name=score0.035959315/float
arr name=DOC_TEXTstr
  ... /str/arr
  doc
/result
lst name=highlighting
  lst name=123456/
/lst
lst name=debug
  str name=rawquerystringDOC_TEXT:3
  1 15/str
  str name=querystringDOC_TEXT:3 1
  15/str
  str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
  str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
  lst name=explain
str name=123456
  0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
  0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
  0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
  /str
/lst
str name=QParserLuceneQParser/str
arr name=filter_queries
  str/
/arr
arr name=parsed_filter_queries/
lst name=timing
  ...
/lst
  /response


 Nothing looks suspicious.

 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.

 Also do you get snippet from the same doc, when you remove quotes from your
 query?




Re: Document match with no highlight

2011-05-12 Thread Phong Dais
Hi,

I use
uniqueKeyDOC_ID/uniquekey
in schema.xml

I think this is the default unique id that is used for matching.  Someone
correct me if I am wrong.

P.



On Thu, May 12, 2011 at 11:01 AM, Bob Sandiford 
bob.sandif...@sirsidynix.com wrote:

 Don't you need to include your unique id field in your 'fl' parameter?  It
 will be needed anyways so you can match up the highlight fragments with the
 result docs once highlighting is working...

 Bob Sandiford | Lead Software Engineer | SirsiDynix
 P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
 www.sirsidynix.com
 Join the conversation - you may even get an iPad or Nook out of it!

 Like us on Facebook!

 Follow us on Twitter!



  -Original Message-
  From: Ahmet Arslan [mailto:iori...@yahoo.com]
  Sent: Thursday, May 12, 2011 7:10 AM
  To: solr-user@lucene.apache.org
   Subject: Re: Document match with no highlight
 
   URL:
  
  http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%2
  23+1+15%22fq=start=0
  
  rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexpl
  ainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
  
   XML:
   ?xml version=1.0 encoding=UTF-8?
   response
 lst name=responseHeader
   int name=status0/int
   int name=QTime19/int
   lst name=params
 str name=explainOther/
 str
   name=indenton/str
 str
   name=hl.flDOC_TEXT/str
 str
   name=wtstandard/str
 str
   name=hl.maxAnalyzedChars-1/str
 str name=hlon/str
 str name=rows10/str
 str
   name=version2.2/str
 str
   name=debugQueryon/str
 str
   name=flDOC_TEXT,score/str
 str name=start0/str
 str name=qDOC_TEXT:3 1
   15/str
 str
   name=qtstandard/str
 str name=fq/
   /lst
 /lst
 result name=response numFound='1 start=0
   maxScore=0.035959315
   doc
 float
   name=score0.035959315/float
 arr name=DOC_TEXTstr
   ... /str/arr
   doc
 /result
 lst name=highlighting
   lst name=123456/
 /lst
 lst name=debug
   str name=rawquerystringDOC_TEXT:3
   1 15/str
   str name=querystringDOC_TEXT:3 1
   15/str
   str
   name=parsedqueryPhraseQuery(DOC_TEXT:3 1
   15)/str
   str
   name=parsedquery_toStringDOC_TEXT:3 1
   15/str
   lst name=explain
 str name=123456
   0.035959315 =
   fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
   tf(phraseFreq=1.0)
   0.92055845 = idf(DOC_TEXT: 3=1
   1=1 15=1)
   0.0390625 =
   fieldNorm(field=DOC_TEXT, doc=0)
   /str
 /lst
 str name=QParserLuceneQParser/str
 arr name=filter_queries
   str/
 /arr
 arr name=parsed_filter_queries/
 lst name=timing
   ...
 /lst
   /response
 
 
  Nothing looks suspicious.
 
  Can you provide two things more;
  fieldType of DOC_TEXT
  and
  field definition of DOC_TEXT.
 
  Also do you get snippet from the same doc, when you remove quotes from
  your query?
 





Changing the schema

2011-05-12 Thread Brian Lamb
If I change the field type in my schema, do I need to rebuild the entire
index? I'm at a point now where it takes over a day to do a full import due
to the sheer size of my application and I would prefer not having to reindex
just because I want to make a change somewhere.

Thanks,

Brian Lamb


RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE
 In fact if I did 3 1 15~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard text field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

field name=DOC_TEXT type=text indexed=true stored=true/

The type text is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

   URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
lst name=responseHeader
  int name=status0/int
  int name=QTime19/int
  lst name=params
str name=explainOther/
str
  name=indenton/str
str
  name=hl.flDOC_TEXT/str
str
  name=wtstandard/str
str
  name=hl.maxAnalyzedChars-1/str
str name=hlon/str
str name=rows10/str
str
  name=version2.2/str
str
  name=debugQueryon/str
str
  name=flDOC_TEXT,score/str
str name=start0/str
str name=qDOC_TEXT:3 1
  15/str
str
  name=qtstandard/str
str name=fq/
  /lst
/lst
result name=response numFound='1 start=0
  maxScore=0.035959315
  doc
float
  name=score0.035959315/float
arr name=DOC_TEXTstr
  ... /str/arr
  doc
/result
lst name=highlighting
  lst name=123456/
/lst
lst name=debug
  str name=rawquerystringDOC_TEXT:3
  1 15/str
  str name=querystringDOC_TEXT:3 1
  15/str
  str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
  str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
  lst name=explain
str name=123456
  0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
  0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
  0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
  /str
/lst
str name=QParserLuceneQParser/str
arr name=filter_queries
  str/
/arr
arr name=parsed_filter_queries/
lst name=timing
  ...
/lst
  /response


 Nothing looks suspicious.

 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.

 Also do you get snippet from the same doc, when you remove quotes from your
 query?




RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE
 Since you're using the standard text field, this should NOT be you're case.

Sorry, for the missing NOT in previous phrase. You should have the same issue 
given what you said, but still, it sound very similar. 

Are you sure your fieldtype text has nothing special ? a tokenizer or filter 
that could add some token in your indexed text but not in your query, like for 
example a WordDelimiter present in index and not query ?

Pierre

-Message d'origine-
De : Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Envoyé : jeudi 12 mai 2011 18:21
À : solr-user@lucene.apache.org
Objet : RE: Document match with no highlight

 In fact if I did 3 1 15~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard text field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

field name=DOC_TEXT type=text indexed=true stored=true/

The type text is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

   URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
lst name=responseHeader
  int name=status0/int
  int name=QTime19/int
  lst name=params
str name=explainOther/
str
  name=indenton/str
str
  name=hl.flDOC_TEXT/str
str
  name=wtstandard/str
str
  name=hl.maxAnalyzedChars-1/str
str name=hlon/str
str name=rows10/str
str
  name=version2.2/str
str
  name=debugQueryon/str
str
  name=flDOC_TEXT,score/str
str name=start0/str
str name=qDOC_TEXT:3 1
  15/str
str
  name=qtstandard/str
str name=fq/
  /lst
/lst
result name=response numFound='1 start=0
  maxScore=0.035959315
  doc
float
  name=score0.035959315/float
arr name=DOC_TEXTstr
  ... /str/arr
  doc
/result
lst name=highlighting
  lst name=123456/
/lst
lst name=debug
  str name=rawquerystringDOC_TEXT:3
  1 15/str
  str name=querystringDOC_TEXT:3 1
  15/str
  str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
  str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
  lst name=explain
str name=123456
  0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
  0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
  0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
  /str
/lst
str name=QParserLuceneQParser/str
arr name=filter_queries
  str/
/arr
arr name=parsed_filter_queries/
lst name=timing
  ...
/lst
  /response


 Nothing looks suspicious.

 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.

 Also do you get snippet from the same doc, when you remove quotes from your
 query?




Support for huge data set?

2011-05-12 Thread atreyu
Hi,

I have about 300 million docs (or 10TB data) which is doubling every 3
years, give or take.  The data mostly consists of Oracle records, webpage
files (HTML/XML, etc.) and office doc files.  There are b/t two and four
dozen concurrent users, typically.  The indexing server has  27 GB of RAM,
but it still gets extremely taxed, and this will only get worse. 

Would Solr be able to efficiently deal with a load of this size?  I am
trying to avoid the heavy cost of GSA, etc...

Thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Support for huge data set?

2011-05-12 Thread Darren Govoni
I have the same questions. 

But from your message, I couldn't tell. Are you using Solr now? Or some
other indexing server?

Darren

On Thu, 2011-05-12 at 09:59 -0700, atreyu wrote:
 Hi,
 
 I have about 300 million docs (or 10TB data) which is doubling every 3
 years, give or take.  The data mostly consists of Oracle records, webpage
 files (HTML/XML, etc.) and office doc files.  There are b/t two and four
 dozen concurrent users, typically.  The indexing server has  27 GB of RAM,
 but it still gets extremely taxed, and this will only get worse. 
 
 Would Solr be able to efficiently deal with a load of this size?  I am
 trying to avoid the heavy cost of GSA, etc...
 
 Thanks.
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Support for huge data set?

2011-05-12 Thread atreyu
Oh, my fault.  No, I am not using Solr yet - just evaluating it.  The current
implementation is a combination of Sphinx and Oracle Text, but I have not
been involved with any of the integration - I'm more of an outside analyst
looking in, but will probably be involved in the integration of any new
methods, particularly Open Source ones.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932704.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Support for huge data set?

2011-05-12 Thread Darren Govoni
Ok, thanks. Yeah, I'm in the same boat and want to know what others have
done with document numbers that large.

I know there is SolrCloud that can federate numerous solr instances and
query across them, so I suspect some solution with 100's of M's of docs
would require a federation.

If anyone has done this, some best practices would be great to know!

On Thu, 2011-05-12 at 10:10 -0700, atreyu wrote:
 Oh, my fault.  No, I am not using Solr yet - just evaluating it.  The current
 implementation is a combination of Sphinx and Oracle Text, but I have not
 been involved with any of the integration - I'm more of an outside analyst
 looking in, but will probably be involved in the integration of any new
 methods, particularly Open Source ones.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932704.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Document match with no highlight

2011-05-12 Thread Phong Dais
Hi,

I read the link provided and I'll need some time to digest what it is
saying.

Here's my text fieldtype.

fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
generateNumberParts=1
  catenateWords=0 catenateNumbers=0 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
fieldtype
Also, I figured out what value in DOC_TEXT cause this issue to occur.
With a DOC_TEXT of (without the quotes):
0176 R3 1.5 TO 

Searching for 3 1 15 returns a match with empty highlight.
Searching for 3 1 15~1 returns a match with highlight.

Can anyone see anything that I'm missing?

Thanks,
P.


On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

  Since you're using the standard text field, this should NOT be you're
 case.

 Sorry, for the missing NOT in previous phrase. You should have the same
 issue given what you said, but still, it sound very similar.

 Are you sure your fieldtype text has nothing special ? a tokenizer or
 filter that could add some token in your indexed text but not in your query,
 like for example a WordDelimiter present in index and not query ?

 Pierre

 -Message d'origine-
 De : Pierre GOSSE [mailto:pierre.go...@arisem.com]
 Envoyé : jeudi 12 mai 2011 18:21
 À : solr-user@lucene.apache.org
 Objet : RE: Document match with no highlight

  In fact if I did 3 1 15~1 I do get snipet also.

 Strange, I had a very similar problem, but with overlapping tokens. Since
 you're using the standard text field, this should be you're case.

 Maybe you could have a look at this issue, since it sound very familiar to
 me :
 https://issues.apache.org/jira/browse/LUCENE-3087

 Pierre

 -Message d'origine-
 De : Phong Dais [mailto:phong.gd...@gmail.com]
 Envoyé : jeudi 12 mai 2011 17:26
 À : solr-user@lucene.apache.org
 Objet : Re: Document match with no highlight

 Hi,

 field name=DOC_TEXT type=text indexed=true stored=true/

 The type text is the default one that came with the default solr 1.4
 install w.o any modifications.

 If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
 get snipet also.

 Hope that helps.

 P.

 On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

URL:
  
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
  
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
  
   XML:
   ?xml version=1.0 encoding=UTF-8?
   response
 lst name=responseHeader
   int name=status0/int
   int name=QTime19/int
   lst name=params
 str name=explainOther/
 str
   name=indenton/str
 str
   name=hl.flDOC_TEXT/str
 str
   name=wtstandard/str
 str
   name=hl.maxAnalyzedChars-1/str
 str name=hlon/str
 str name=rows10/str
 str
   name=version2.2/str
 str
   name=debugQueryon/str
 str
   name=flDOC_TEXT,score/str
 str name=start0/str
 str name=qDOC_TEXT:3 1
   15/str
 str
   name=qtstandard/str
 str name=fq/
   /lst
 /lst
 result name=response numFound='1 start=0
   maxScore=0.035959315
   doc
 float
   name=score0.035959315/float
 arr name=DOC_TEXTstr
   ... /str/arr
   doc
 /result
 lst name=highlighting
   lst name=123456/
 /lst
 lst name=debug
   str name=rawquerystringDOC_TEXT:3
   1 15/str
   str name=querystringDOC_TEXT:3 1
   15/str
   str
   name=parsedqueryPhraseQuery(DOC_TEXT:3 1
   15)/str
   str
   name=parsedquery_toStringDOC_TEXT:3 1
   15/str
   lst name=explain
 str name=123456
   0.035959315 =
   fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
   tf(phraseFreq=1.0)
   0.92055845 = idf(DOC_TEXT: 3=1
   1=1 15=1)
   0.0390625 =
   fieldNorm(field=DOC_TEXT, doc=0)
   /str
 /lst
 str name=QParserLuceneQParser/str
 arr name=filter_queries
   str/
 /arr
 arr 

What is correct use of HTMLStripCharFilter in Solr 3.1

2011-05-12 Thread nicksnels
Hi,

I recently upgraded from Solr 1.3 to Solr 3.1 in order to take advantage of
the HTMLStripCharFilter. But it isn't working as I expected.

I have a text field that may contain HTML tags. I however would like to
store it in Solr without the HTML tags. And retrieve the text field for
display and for highlighting without HTML tags.

I added charFilter class=solr.HTMLStripCharFilterFactory/ to the top of
fieldType name=text class=solr.TextField positionIncrementGap=100
autoGeneratePhraseQueries=true in the schema.xml file of the solr
example, both in analyzer type=index and in analyzer type=query.

And the text field is simply:

field name=text type=text indexed=true stored=true/

Now, when I do a search. The text field still has all the HTML tags in them
and the highlighting is totally screwed up with em tags around virtually
every word. What am I doing wrong?

Kind regards,

Nick

--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-correct-use-of-HTMLStripCharFilter-in-Solr-3-1-tp2933021p2933021.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Anyone familiar with Solandra or Lucendra?

2011-05-12 Thread kenf_nc
I modified the subject to include Lucendra, in case anyone has heard of it by
that name. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2933051.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is correct use of HTMLStripCharFilter in Solr 3.1

2011-05-12 Thread Ahmet Arslan
 I recently upgraded from Solr 1.3 to Solr 3.1 in order to
 take advantage of
 the HTMLStripCharFilter. But it isn't working as I
 expected.
 
 I have a text field that may contain HTML tags. I however
 would like to
 store it in Solr without the HTML tags. And retrieve the
 text field for
 display and for highlighting without HTML tags.
 
 I added charFilter
 class=solr.HTMLStripCharFilterFactory/ to the top of
 fieldType name=text class=solr.TextField
 positionIncrementGap=100
 autoGeneratePhraseQueries=true in the schema.xml file
 of the solr
 example, both in analyzer type=index and in
 analyzer type=query.
 
 And the text field is simply:
 
 field name=text type=text indexed=true
 stored=true/
 
 Now, when I do a search. The text field still has all the
 HTML tags in them
 and the highlighting is totally screwed up with em tags
 around virtually
 every word. What am I doing wrong?

You need to strip html tag before analysis phase. If you are using DIH, you can 
use stripHTML=true transformer.


Re: Anyone familiar with Solandra or Lucandra?

2011-05-12 Thread Smiley, David W.
The old name is Lucandra not Lucendra. I've changed the subject accordingly.

I'm looking forward to responses from people but I'm afraid it appears it has 
not yet gotten much uptake yet. I think it has enormous potential once it's 
hardened a bit and there's more documentation. Personally, I've been looking 
forward to kicking the tires a bit once I get some time.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On May 12, 2011, at 2:54 PM, kenf_nc wrote:

 I modified the subject to include Lucendra, in case anyone has heard of it by
 that name. 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2933051.html
 Sent from the Solr - User mailing list archive at Nabble.com.






Re: What is correct use of HTMLStripCharFilter in Solr 3.1

2011-05-12 Thread Jonathan Rochkind

On 5/12/2011 2:55 PM, Ahmet Arslan wrote:

I recently upgraded from Solr 1.3 to Solr 3.1 in order to
take advantage of
the HTMLStripCharFilter. But it isn't working as I
expected.


You need to strip html tag before analysis phase. If you are using DIH, you can use 
stripHTML=true transformer.




Wait, then what's the HTMLStripCharFilter for?


Re: Support for huge data set?

2011-05-12 Thread Jonathan Rochkind
If each document is VERY small, it's actually possible that one Solr 
server could handle it -- especially if you DON'T try to do facetting or 
other similar features, but stick to straight search and relevancy. 
There are other factors too. But # of documents is probably less 
important than total size of index, or number of unique terms -- of 
course # of documents often correlates to those too.


But if each document is largeish... yeah, I suspect that'll be too much 
for any one Solr server. You'll have to use some kind of distribution. 
Out of the box, Solr has a Distributed Search function meant for this 
use case. http://wiki.apache.org/solr/DistributedSearch  .   Some Solr 
features don't work under a Distributed setup, but the basic ones are 
there. There are some other add-ons not (yet anyway) part of Solr distro 
that try to solve this in even more sophisticated ways too, like SolrCloud.


I don't personally know of anyone indexing that many documents, although 
it is probably done. But I do know of the HathiTrust project (not me 
personally) indexing fewer documents but still adding up to terrabytes 
of total index (millions to tens of millions of documents, but each one 
is a digitized book that could be 100-400 pages), using Distributed 
Searching feature, succesfully, although it required some care and 
maintenance it wasn't just a turn it on and it works situation.


http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond

http://www.hathitrust.org/technical_reports/Large-Scale-Search.pdf

On 5/12/2011 1:06 PM, Darren Govoni wrote:

I have the same questions.

But from your message, I couldn't tell. Are you using Solr now? Or some
other indexing server?

Darren

On Thu, 2011-05-12 at 09:59 -0700, atreyu wrote:

Hi,

I have about 300 million docs (or 10TB data) which is doubling every 3
years, give or take.  The data mostly consists of Oracle records, webpage
files (HTML/XML, etc.) and office doc files.  There are b/t two and four
dozen concurrent users, typically.  The indexing server has  27 GB of RAM,
but it still gets extremely taxed, and this will only get worse.

Would Solr be able to efficiently deal with a load of this size?  I am
trying to avoid the heavy cost of GSA, etc...

Thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: What is correct use of HTMLStripCharFilter in Solr 3.1

2011-05-12 Thread Mike Sokolov
It preserves the location of the terms in the original HTML document so 
that you can highlight terms in HTML.  This makes it possible (for 
instance) to display the entire document, with all the search terms 
highlighted, or (with some careful surgery) to display formatted HTML 
(bold, italic, etc) in your search results.


-Mike

On 05/12/2011 03:42 PM, Jonathan Rochkind wrote:

On 5/12/2011 2:55 PM, Ahmet Arslan wrote:

I recently upgraded from Solr 1.3 to Solr 3.1 in order to
take advantage of
the HTMLStripCharFilter. But it isn't working as I
expected.

You need to strip html tag before analysis phase. If you are using 
DIH, you can use stripHTML=true transformer.





Wait, then what's the HTMLStripCharFilter for?


Re: What is correct use of HTMLStripCharFilter in Solr 3.1

2011-05-12 Thread Ahmet Arslan
 Wait, then what's the HTMLStripCharFilter for?

To remove html tags in the analysis phase. For instance it can be used to 
display original html documents with search terms highlighted. 


Re: Support for huge data set?

2011-05-12 Thread atreyu
Thanks for the detailed response, Jonathon.  I will look into the links and
check out SolrCloud and Distributed Search.  Load-sharing b/t 2 or 3 servers
should not pose a problem, so long as it is robust (or at least not slower),
fault-tolerant, and reliable.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2933367.html
Sent from the Solr - User mailing list archive at Nabble.com.


field type=string vs field type=text

2011-05-12 Thread chetan
What is the difference between setting a fields type to string vs setting it
to text.

e.g.
field name=PATH type=string indexed=false stored=true/
or
field name=PATH type=text indexed=false stored=true/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/field-type-string-vs-field-type-text-tp2932083p2932083.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: field type=string vs field type=text

2011-05-12 Thread Gora Mohanty
On Thu, May 12, 2011 at 8:23 PM, chetan guptakache...@gmail.com wrote:
 What is the difference between setting a fields type to string vs setting it
 to text.

 e.g.
 field name=PATH type=string indexed=false stored=true/
 or
 field name=PATH type=text indexed=false stored=true/
[...]

Please take a closer look at the fieldType definitions towards the
beginning of the default schema.xml. The text type has tokenizers,
and analyzers applied to it, while the string type does no processing
of the input data.

Regards,
Gora


A couple newbie questions

2011-05-12 Thread Stuart Smith
Hello!
  I just started using Solr. My general use case is pushing a lot of data from 
Hbase to solr via an M/R job using Solrj. I have lots of questions, but the 
ones I'd like to start with are:

(1)
I noticed this:
http://lucene.472066.n3.nabble.com/what-happens-to-docsPending-if-stop-solr-before-commit-td2781493.html

Would seem to indicate that pending documents are commited on restart. This is 
great! I also noticed, that while there is a lag on start up if I have 
documents pending - it's only a few minutes or so. But if I issue a commit for 
the same number of files, the server stays blocked for 20 min or so. It almost 
seems like it would be a faster to add all my documents and restart the server, 
rather than issuing a commit. Am I doing something strange? Is this a valid 
conclusion?

(2)
I'm also getting a lot of errors about invalid UTF-8:

SEVERE: org.apache.solr.common.SolrException: Invalid UTF-8 character 0x at 
char #2380289, byte #2378666)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)

It could be that the values I have in some of my document fields is indeed 
invalid. My question is what does this mean when I'm submitting a batch of 
documents (specifically I'm using Solrj's StreamingUpdateSolrServer w/ a 
BinaryRequestWriter) - do I:

- lose the whole batch that has the bad document?
- lose the document?
- lose the one field?

I wish it was the third, hope it's the second, and I'm afraid it's the first...

Ooo.. and I guess a third question - I'm having trouble finding a document that 
describes the overall design/functionality of Solr, something that would help 
me reason about stuff like what happens to pending documents when the server 
restarts or does a commit in one indexing thread commit previously added 
documents from another indexing thread. Both of those I've answered to my 
satisfaction by looking over the Solr logs  mailing lists, but I'm wondering 
if there's some documentation I missed somehow..
For example, something like this:
http://hadoop.apache.org/common/docs/current/hdfs_design.html
http://hbase.apache.org/book.html#architecture

Thanks!

Take care,
  -stu


Re: field type=string vs field type=text

2011-05-12 Thread Tomás Fernández Löbbe
Hi, my recommendation: To quickly understand the difference between those
two different field types, index one document using string and text fields,
then facet on those fields and you will see how the terms were indexed.

Using one field type or the other will depend on what you want to do with
that field.

On Thu, May 12, 2011 at 5:18 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Thu, May 12, 2011 at 8:23 PM, chetan guptakache...@gmail.com wrote:
  What is the difference between setting a fields type to string vs setting
 it
  to text.
 
  e.g.
  field name=PATH type=string indexed=false stored=true/
  or
  field name=PATH type=text indexed=false stored=true/
 [...]

 Please take a closer look at the fieldType definitions towards the
 beginning of the default schema.xml. The text type has tokenizers,
 and analyzers applied to it, while the string type does no processing
 of the input data.

 Regards,
 Gora



Re: Replication Clarification Please

2011-05-12 Thread Ravi Solr
Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved
from 1.4.1 to 3.1 and have made several changes to configuration. The
configuration changes have worked nicely till now and the replication
is finishing within the interval and not backing up. The changes we
made are as follows

1. Increased the mergeFactor from 10 to 15
2. Increased ramBufferSizeMB to 1024
3. Changed lockType to single (previously it was simple)
4. Set maxCommitsToKeep to 1 in the deletionPolicy
5. Set maxPendingDeletes to 0
6. Changed caches from LRUCache to FastLRUCache as we had hit ratios
well over 75% to increase warming speed
7. Increased the poll interval to 6 minutes and re-indexed all content.

Thanks,

Ravi Kiran Bhaskar

On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
alexan...@trulia.com wrote:
 Ravi,

 if you have what looks like a full replication each time even if the
 master generation is greater than slave, try to watch for the index on
 both master and slave the same time to see what files are getting
 replicated. You probably may need to adjust your merge factor, as Bill
 mentioned.

 -Alexander



 On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote:
 Hello Mr. Kanarsky,
                 Thank you very much for the detailed explanation,
 probably the best explanation I found regarding replication. Just to
 be sure, I wanted to test solr 3.1 to see if it alleviates the
 problems...I dont think it helped. The master index version and
 generation are greater than the slave, still the slave replicates the
 entire index form master (see replication admin screen output below).
 Any idea why it would get the whole index everytime even in 3.1 or am
 I misinterpreting the output ? However I must admit that 3.1 finished
 the replication unlike 1.4.1 which would hang and be backed up for
 ever.

 Master        http://masterurl:post/solr-admin/searchcore/replication
       Latest Index Version:null, Generation: null
       Replicatable Index Version:1296217097572, Generation: 12726

 Poll Interval         00:03:00

 Local Index   Index Version: 1296217097569, Generation: 12725

       Location: /data/solr/core/search-data/index
       Size: 944.32 MB
       Times Replicated Since Startup: 148
       Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
       Config Files Replicated At: null
       Config Files Replicated: null
       Times Config Files Replicated Since Startup: null
       Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011

 Current Replication Status    Start Time: Tue May 10 12:32:41 EDT 2011
       Files Downloaded: 18 / 108
       Downloaded: 317.48 KB / 436.24 MB [0.0%]
       Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%]
       Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 KB/s


 Thanks,
 Ravi Kiran Bhaskar

 On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
 alexan...@trulia.com wrote:
  Ravi,
 
  as far as I remember, this is how the replication logic works (see
  SnapPuller class, fetchLatestIndex method):
 
  1. Does the Slave get the whole index every time during replication or
  just the delta since the last replication happened ?
 
 
  It look at the index version AND the index generation. If both slave's
  version and generation are the same as on master, nothing gets
  replicated. if the master's generation is greater than on slave, the
  slave fetches the delta files only (even if the partial merge was done
  on the master) and put the new files from master to the same index
  folder on slave (either index or index.timestamp, see further
  explanation). However, if the master's index generation is equals or
  less than one on slave, the slave does the full replication by
  fetching all files of the master's index and place them into a
  separate folder on slave (index.timestamp). Then, if the fetch is
  successfull, the slave updates (or creates) the index.properties file
  and puts there the name of the current index folder. The old
  index.timestamp folder(s) will be kept in 1.4.x - which was treated
  as a bug - see SOLR-2156 (and this was fixed in 3.1). After this, the
  slave does commit or reload core depending whether the config files
  were replicated. There is another bug in 1.4.x that fails replication
  if the slave need to do the full replication AND the config files were
  changed - also fixed in 3.1 (see SOLR-1983).
 
  2. If there are huge number of queries being done on slave will it
  affect the replication ? How can I improve the performance ? (see the
  replications details at he bottom of the page)
 
 
  From my experience the half of the replication time is a time when the
  transferred data flushes to the disk. So the IO impact is important.
 
  3. Will the segment names be same be same on master and slave after
  replication ? I see that they are different. Is this correct ? If it
  is correct how does the slave know what to fetch the next time i.e.
  the delta.
 
 
  They should be the same. The 

DIH help request: nested xml entities and xpath

2011-05-12 Thread Weiss, Eric
Apologies in advance if this topic/question has been previously answered…I have 
scoured the docs, mail archives, web looking for an answer(s) with no luck.  I 
am sure I am just being dense or missing something obvious…please point out my 
stupidity as my head hurts trying to get this working.

Solr 3.1
Java 1.6
Eclipse/Tomcat 7/Maven 2.x

Goal: to extract manufacturer names from a repeating list of keywords each 
denoted by a Category, one of which is Manufacturer, and load them into a 
MsgKeywordMF field  (see xml below)

I have xml files I am loading via DIH.  This an abbreviated example xml data 
(each file has repeating Report items, each report has repeating MsgSet, Msg, 
MsgList, etc items).  Notice the nested repeating groups, namely MsgItems, 
within each document (Report):


Report

  ReportMeta

ReportDate02/22/2011/ReportDate

 …

  /ReportMeta

  MsgSet

Msg

  SourceDocIDhttp://someurl.com/path/to/doc/SourceDocID

   …

  DocumentTextblah blah/DocumentText

  MsgList

MsgItem

  MsgTypeSomeType/MsgType

  CategoryLocation/Category

  KeywordUSA/Keyword

/MsgItem

MsgItem

  MsgTypeAnotherType/MsgType

  CategoryManufacturer/Category

  KeywordApple/Keyword

/MsgItem

…

  /MsgList

/Msg

  /MsgSet

/Report
Report
…
/Report
Report
…
/Report
…

Here is my data-config.xml:


dataConfig

  dataSource type=FileDataSource encoding=UTF-8 /


  document

entity name=fileload rootEntity=false

processor=FileListEntityProcessor fileName=^.*\.xml$ 
recursive=false baseDir=/files/xml/

  entity name=report

rootEntity=true pk=id

  url=${fileload.fileAbsolutePath} 
processor=XPathEntityProcessor

  forEach=/Report/MsgSet/Msg onError=skip

  transformer=DateFormatTransformer,RegexTransformer

  field column=DocumentText xpath=/Report/MsgSet/Msg/DocumentText/

  field column=id xpath=/Report/MsgSet/Msg/SourceDocID/

  field column=MsgCategory 
xpath=/Report/MsgSet/Msg/MsgList/MsgItem/Category /

  field column=MsgKeyword xpath=/Report/MsgSet/Msg/MsgList/MsgItem/Keyword 
/

  field column=MsgKeywordMF 
xpath=/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword /

  …

  /entity

/entity

  /document

/dataConfig


As seen in my config and sample data above, I am extracting the repeating 
Keywords into the the MsgKeyword field.  Also, and the part that does NOT 
work, I am trying to extract into a separate field just the keywords that have 
a Category of Manufacturer --   field column=MsgKeywordMF 
xpath=/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword /

I have also tried: field column=MsgKeywordMF 
xpath=/Report/MsgSet/Msg/MsgList/MsgItem[@Category='Manufacturer']/Keyword /
…after changing the Category to an attribute of MsgItem (MsgItem 
Category=Location) but it too fails to match.

I have tested my xpath notation against my xml data file using various xpath 
evaluator tools, like within Eclipse, and it matches perfectly…but I can't get 
it to match/work during import.

As I am able to understand it, DIH does not support nested/correlated entities, 
at least not with XML data sources using nested entity tags.  I've tried 
without success to nest entities but I can't correlate the nested entity with 
the parent.  I think the way I'm trying should work, but no luck so far….

BTW, I can't easily change the xml format, although it is possible with some 
pain…

Any ideas?

TIA,
-- Eric



solr velocity.log setting

2011-05-12 Thread Yuhan Zhang
hi all,

I'm new to solr, and trying to install it on tomcat. however, an exception
was reached when
the page http://localhost/sorl/browse was visited:

 *FileNotFoundException: velocity.log (Permission denied) *

looks like solr is trying to create a velocity.log file to tomcat root. so,
how should I set the configuration
file on solr to change the location that velocity.log is logging to?

Thank you.

Y


Re: DIH help request: nested xml entities and xpath

2011-05-12 Thread Ashique
Hi All,

I am a Java/J2ee programmer and very new to SOLR. I would  like to index a
table in a postgresSql database to SOLR. Then searching the records from a
GUI (Jsp Page) and showing the results in tabular form. Could any one help
me out with a simple sample code.

Thank you.

Regards,
Ashique

On Fri, May 13, 2011 at 4:53 AM, Weiss, Eric wei...@llnl.gov wrote:

 Apologies in advance if this topic/question has been previously answered…I
 have scoured the docs, mail archives, web looking for an answer(s) with no
 luck.  I am sure I am just being dense or missing something obvious…please
 point out my stupidity as my head hurts trying to get this working.

 Solr 3.1
 Java 1.6
 Eclipse/Tomcat 7/Maven 2.x

 Goal: to extract manufacturer names from a repeating list of keywords each
 denoted by a Category, one of which is Manufacturer, and load them into a
 MsgKeywordMF field  (see xml below)

 I have xml files I am loading via DIH.  This an abbreviated example xml
 data (each file has repeating Report items, each report has repeating
 MsgSet, Msg, MsgList, etc items).  Notice the nested repeating groups,
 namely MsgItems, within each document (Report):


 Report

  ReportMeta

ReportDate02/22/2011/ReportDate

 …

  /ReportMeta

  MsgSet

Msg

  SourceDocIDhttp://someurl.com/path/to/doc/SourceDocID

   …

  DocumentTextblah blah/DocumentText

  MsgList

MsgItem

  MsgTypeSomeType/MsgType

  CategoryLocation/Category

  KeywordUSA/Keyword

/MsgItem

MsgItem

  MsgTypeAnotherType/MsgType

  CategoryManufacturer/Category

  KeywordApple/Keyword

/MsgItem

…

  /MsgList

/Msg

  /MsgSet

 /Report
 Report
 …
 /Report
 Report
 …
 /Report
 …

 Here is my data-config.xml:


 dataConfig

  dataSource type=FileDataSource encoding=UTF-8 /


  document

entity name=fileload rootEntity=false

processor=FileListEntityProcessor fileName=^.*\.xml$
 recursive=false baseDir=/files/xml/

  entity name=report

rootEntity=true pk=id

  url=${fileload.fileAbsolutePath}
 processor=XPathEntityProcessor

  forEach=/Report/MsgSet/Msg onError=skip

  transformer=DateFormatTransformer,RegexTransformer

  field column=DocumentText
 xpath=/Report/MsgSet/Msg/DocumentText/

  field column=id xpath=/Report/MsgSet/Msg/SourceDocID/

  field column=MsgCategory
 xpath=/Report/MsgSet/Msg/MsgList/MsgItem/Category /

  field column=MsgKeyword
 xpath=/Report/MsgSet/Msg/MsgList/MsgItem/Keyword /

  field column=MsgKeywordMF
 xpath=/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword
 /

  …

  /entity

/entity

  /document

 /dataConfig


 As seen in my config and sample data above, I am extracting the repeating
 Keywords into the the MsgKeyword field.  Also, and the part that does NOT
 work, I am trying to extract into a separate field just the keywords that
 have a Category of Manufacturer --   field column=MsgKeywordMF
 xpath=/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword
 /

 I have also tried: field column=MsgKeywordMF
 xpath=/Report/MsgSet/Msg/MsgList/MsgItem[@Category='Manufacturer']/Keyword
 /
 …after changing the Category to an attribute of MsgItem (MsgItem
 Category=Location) but it too fails to match.

 I have tested my xpath notation against my xml data file using various
 xpath evaluator tools, like within Eclipse, and it matches perfectly…but I
 can't get it to match/work during import.

 As I am able to understand it, DIH does not support nested/correlated
 entities, at least not with XML data sources using nested entity tags.  I've
 tried without success to nest entities but I can't correlate the nested
 entity with the parent.  I think the way I'm trying should work, but no luck
 so far….

 BTW, I can't easily change the xml format, although it is possible with
 some pain…

 Any ideas?

 TIA,
 -- Eric




Faceting question

2011-05-12 Thread Mark
Is there anyway to perform a search that searches across 2 fields yet 
only gives me facets accounts for documents matching 1 field?


For example

If I have fields A  B and I perform a search across I would like to 
match my query across either of these two fields. I would then like 
facet counts for how many documents matched in field A only.


Can this accomplished? If not out of the box what classes should I look 
into to create this myself?


Thanks


Fieldcollapsing patxh not applied properly

2011-05-12 Thread Isha Garg

Hi kai,
  as per your previous mails you have already applied the 
patches with solr 1.4.I followed the steps of your mail accordingly . 
But During step 9  i got the error
# 1 out of 1 hunked failed.When I apply ony 
SOLR-236-1_4_1-paging-totals-working.patch it build successfully but the 
changes are not get reflected in solr-src .

Kindly tell me where I am going wrong.
Steps are:

1.Downloaded [solr]
2Downloaded [SOLR-236-1_4_1-paging-totals-working.patch]
3Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` (
4 Downloaded [SOLR-236-1_4_1-NPEfix.patch]
5 Extracted the Solr archive
6 Applied both patches:
7 `cd apache-solr-1.4.1`
8 `patch -p0  ../SOLR-236-1_4_1-paging-totals-working.patch`
9`patch -p0  ../SOLR-236-1_4_1-NPEfix.patch`
10 Build Solr
11 `ant clean`
12 `ant example` ... tells me BUILD SUCCESSFUL

Thanks in advance!
Isha garg


Fieldcollapsing patch not applied properly

2011-05-12 Thread Isha Garg

Hi kai,

  As per your previous mails you have already applied the
patches with solr 1.4.I followed the steps of your mail accordingly .
But During step 9  i got the error
# 1 out of 1 hunked failed.When I apply ony
SOLR-236-1_4_1-paging-totals-working.patch it build successfully but the
changes are not get reflected in solr-src .
Kindly tell me where I am going wrong.
Steps are:

1.Downloaded [solr]
2Downloaded [SOLR-236-1_4_1-paging-totals-working.patch]
3Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` (
4 Downloaded [SOLR-236-1_4_1-NPEfix.patch]
5 Extracted the Solr archive
6 Applied both patches:
7 `cd apache-solr-1.4.1`
8 `patch -p0   ../SOLR-236-1_4_1-paging-totals-working.patch`
9`patch -p0   ../SOLR-236-1_4_1-NPEfix.patch`
10 Build Solr
11 `ant clean`
12 `ant example` ... tells me BUILD SUCCESSFUL

Thanks in advance!
Isha garg