Re: Fuzzy Query Param

2011-06-30 Thread Floyd Wu
if this is edit distance implementation, what is the result apply to CJK
query? For example, 您好~3

Floyd


2011/6/30 entdeveloper cameron.develo...@gmail.com

 I'm using Solr trunk.

 If it's levenstein/edit distance, that's great, that's what I want. It just
 didn't seem to be officially documented anywhere so I wanted to find out
 for
 sure. Thanks for confirming.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3122418.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: conditionally update document on unique id

2011-06-30 Thread Shalin Shekhar Mangar
On Thu, Jun 30, 2011 at 2:06 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Wed, Jun 29, 2011 at 4:32 PM, eks dev eks...@googlemail.com wrote:
  req.getSearcher().getFirstMatch(t) != -1;

 Yep, this is currently the fastest option we have.


Just for my understanding, this method won't use any caches but still may be
faster across repeated runs for the same token? I'm asking because Eks said
that they have 50%-55% duplicate documents.

-- 
Regards,
Shalin Shekhar Mangar.


Taxonomy faceting

2011-06-30 Thread Russell B
I have a hierarchical taxonomy of documents that I would like users to be
able to search either through search or drill-down faceting.  The
documents may appear at multiple points in the hierarchy.  I've got a
solution working as follows: a multivalued field labelled category which for
each document defines where in the tree it should appear.  For example: doc1
has the category field set to 0/topics, 1/topics/computing,
2/topic/computing/systems.

I then facet on the 'category' field, filter the results with fq={!raw
f=category}1/topics/computing to get everything below that point on the
tree, and use f.category.facet.prefix to restrict the facet fields to the
current level.

Full query something like:

http://localhost:8080/solr/select/?q=somethingfacet=truefacet.field=categoryfq={!rawf=category}1/topics/computingf.category.facet.prefix=2/topic/computing


Playing around with the results, it seems to work ok but despite reading
lots about faceting I can't help feel there might be a better solution.  Are
there better ways to achieve this?  Any comments/suggestions are welcome.

(Any suggestions as to what interface I can put on top of this are also
gratefully received!).


Thanks,

Russell


MergerFacor effect on indexes

2011-06-30 Thread Romi
my solrconfig.xml configuration is as :
mainIndex
   useCompoundFilefalse/useCompoundFile
ramBufferSizeMB32/ramBufferSizeMB
mergeFactor5/mergeFactor
maxMergeDocs10/maxMergeDocs
maxFieldLength1/maxFieldLength
unlockOnStartupfalse/unlockOnStartup
  /mainIndex



my solrconfig.xml configuration is as :

*mainIndex
   useCompoundFilefalse/useCompoundFile
ramBufferSizeMB32/ramBufferSizeMB
mergeFactor5/mergeFactor
maxMergeDocs10/maxMergeDocs
maxFieldLength1/maxFieldLength
unlockOnStartupfalse/unlockOnStartup
  /mainIndex*

and index size is 12mb. but when i change my mergeFactor i am not finding
any effect in my indexes., ie. the no of segments are exactly same. i am not
getting which configuration will effect the no of segments. as i suppose it
is mergefactor. and my next problem is which configuration defines the
number of docs per segments and what will be the size of this segment so
that next segments will be created

please make me clear about these points


-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/MergerFacor-effect-on-indexes-tp3125146p3125146.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to use solr clustering to show in search results

2011-06-30 Thread Romi
wanted to use clustering in my search results, i configured solr for
clustering and i got following json for clusters. But i am not getting how
to use it to show in search results. as corresponding to one doc i have
number of fields and up till now i am showing name, description and id. now
in clusters i have labels and doc id. then how to use my docs in clusters, i
am really confused what to do Please reply. 

*
clusters:[

{
   labels:[
   Complement any Business Casual or Semi-formal
Attire
],
   docs:[
7799,
7801
]
  },
{
   labels:[
Design
],
   docs:[
8252,
7885
]
  },
{
   labels:[
Elegant Ring has an Akoya Cultured Pearl
],
   docs:[
8142,
8139
]
  },
{
   labels:[
Feel Amazing in these Scintillating Earrings
Perfect
],
   docs:[
12250,
12254
]
  },
{
   labels:[
Formal Evening Attire
],
   docs:[
8151,
8004
]
  },
{
   labels:[
Pave Set
],
   docs:[
7788,
8169
]
  },
{
   labels:[
Subtle Look or Layer it or Attach
],
   docs:[
8014,
8012
]
  },
   {
   labels:[
Three-stone Setting is Elegant and Fun
],
   docs:[
8335,
8337
]
  },
{
   labels:[
Other Topics
],
   docs:[
8038,
7850,
7795,
7989,
7797
]
  {
]*


-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-solr-clustering-to-show-in-search-results-tp3125149p3125149.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding german phonetic to solr

2011-06-30 Thread Jürgen Tiedemann
Hi all,

does solar support german phonetic? Searching for how to add german phonetic 
to 
solr on google does not deliver good results, just lots of JIRA stuff. I 
searched for cologne phonetic too. The wikis 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
 and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
my question. Please, can someone tell me how to do it or where to look for 
appropriate information.

Nice regards

Jürgen


Re: Adding german phonetic to solr

2011-06-30 Thread Paul Libbrecht
Jürgen,

I haven't had the time to deploy it but i heard about Kölner Phonetik that 
was to be contributed as part of apache-commons-codec.
It probably still is just a patch in a jira issue.
https://issues.apache.org/jira/browse/CODEC-106
The contribution was posted to commons-dev on september 15th 2010.

Bringing this reachable into Solr would be interesting but it's a bit of work.

We have used the Double-Metaphone indexer with Lucene with reasonable success 
in ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess 
is really a desirable feature of a phonetic environment.
You might want to also care for all the proper nouns around for which 
tradition phonetics is doomed to fail if, at least, your texts are a bit with 
international names!

paul


Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit :

 Hi all,
 
 does solar support german phonetic? Searching for how to add german phonetic 
 to 
 solr on google does not deliver good results, just lots of JIRA stuff. I 
 searched for cologne phonetic too. The wikis 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
 and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
 my question. Please, can someone tell me how to do it or where to look for 
 appropriate information.
 
 Nice regards
 
 Jürgen



How to optimize solr indexes

2011-06-30 Thread Romi
when i run solr/admin page i got this information, it shows optimize=true,
but i have not set optimize=true in configuration file than how it is
optimizing the indexes. and how can i set it to false then .


/Schema Information

Unique Key: UID_PK

Default Search Field: text

numDocs: 2881

maxDoc: 2881

numTerms: 41960

version: 1309429290159

optimized: true

current: true

hasDeletions: false

directory:
org.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory@
C:\apache-solr-1.4.0\example\example-DIH\solr\db\data\index

lastModified: 2011-06-30T10:25:04.89Z/


-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-optimize-solr-indexes-tp3125293p3125293.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taxonomy faceting

2011-06-30 Thread darren
That's a good way. How does it perform?

Another way would be to store the parent topics in a field.
Whenever a parent node is drilled-into, simply search for all documents
with that parent. Perhaps not as elegant as your approach though.

I'd be interested in the performance comparison between the two approaches.

 I have a hierarchical taxonomy of documents that I would like users to be
 able to search either through search or drill-down faceting.  The
 documents may appear at multiple points in the hierarchy.  I've got a
 solution working as follows: a multivalued field labelled category which
 for
 each document defines where in the tree it should appear.  For example:
 doc1
 has the category field set to 0/topics, 1/topics/computing,
 2/topic/computing/systems.

 I then facet on the 'category' field, filter the results with fq={!raw
 f=category}1/topics/computing to get everything below that point on the
 tree, and use f.category.facet.prefix to restrict the facet fields to the
 current level.

 Full query something like:

 http://localhost:8080/solr/select/?q=somethingfacet=truefacet.field=categoryfq={!rawf=category}1/topics/computingf.category.facet.prefix=2/topic/computing


 Playing around with the results, it seems to work ok but despite reading
 lots about faceting I can't help feel there might be a better solution.
 Are
 there better ways to achieve this?  Any comments/suggestions are welcome.

 (Any suggestions as to what interface I can put on top of this are also
 gratefully received!).


 Thanks,

 Russell




Re: How to optimize solr indexes

2011-06-30 Thread Ahmet Arslan
 when i run solr/admin page i got this
 information, it shows optimize=true,
 but i have not set optimize=true in configuration file than
 how it is
 optimizing the indexes. and how can i set it to false then
 .
 
 
 /Schema Information
 
     Unique Key: UID_PK
 
     Default Search Field: text
 
     numDocs: 2881
 
     maxDoc: 2881
 
     numTerms: 41960
 
     version: 1309429290159
 
     optimized: true
 
     current: true
 
     hasDeletions: false
 
     directory:
 org.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory@
 C:\apache-solr-1.4.0\example\example-DIH\solr\db\data\index
 
     lastModified: 2011-06-30T10:25:04.89Z/
 

It seems that you are using DIH. By default both delta and full import issues 
an optimize at the end.


Re: Fuzzy Query Param

2011-06-30 Thread Michael McCandless
Good question... I think in Lucene 4.0, the edit distance is (will be)
in Unicode code points, but in past releases, it's UTF16 code units.

Mike McCandless

http://blog.mikemccandless.com

2011/6/30 Floyd Wu floyd...@gmail.com:
 if this is edit distance implementation, what is the result apply to CJK
 query? For example, 您好~3

 Floyd


 2011/6/30 entdeveloper cameron.develo...@gmail.com

 I'm using Solr trunk.

 If it's levenstein/edit distance, that's great, that's what I want. It just
 didn't seem to be officially documented anywhere so I wanted to find out
 for
 sure. Thanks for confirming.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3122418.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: How to optimize solr indexes

2011-06-30 Thread Romi
and if i want to set it as optimize=false then what i need to do ??

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-optimize-solr-indexes-tp3125293p3125474.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: Adding german phonetic to solr

2011-06-30 Thread Jürgen Tiedemann
Hi Paul,

thanks for the quick reply. I replaced commons-codec-1.4.jar with 
commons-codec-1.5.jar to get the ColognePhonetic. In schema.xml I added

filter class=solr.PhoneticFilterFactory encoder=ColognePhonetic 
inject=true/

but then I get

org.apache.solr.common.SolrException: Unknown encoder: ColognePhonetic 
[[CAVERPHONE, SOUNDEX, METAPHONE, DOUBLEMETAPHONE, REFINEDSOUNDEX]].

How do I get PhoneticFilterFactory to know ColognePhonetic? Or is my approach 
completely wrong?

Jürgen







Von: Paul Libbrecht p...@hoplahup.net
An: solr-user@lucene.apache.org
Gesendet: Donnerstag, den 30. Juni 2011, 12:09:18 Uhr
Betreff: Re: Adding german phonetic to solr

Jürgen,

I haven't had the time to deploy it but i heard about Kölner Phonetik that 
was 
to be contributed as part of apache-commons-codec.
It probably still is just a patch in a jira issue.
https://issues.apache.org/jira/browse/CODEC-106
The contribution was posted to commons-dev on september 15th 2010.

Bringing this reachable into Solr would be interesting but it's a bit of work.

We have used the Double-Metaphone indexer with Lucene with reasonable success 
in 
ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess is 
really a desirable feature of a phonetic environment.
You might want to also care for all the proper nouns around for which 
tradition phonetics is doomed to fail if, at least, your texts are a bit with 
international names!

paul


Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit :

 Hi all,
 
 does solar support german phonetic? Searching for how to add german phonetic 
to 

 solr on google does not deliver good results, just lots of JIRA stuff. I 
 searched for cologne phonetic too. The wikis 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
y
 and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
 my question. Please, can someone tell me how to do it or where to look for 
 appropriate information.
 
 Nice regards
 
 Jürgen

Re: Looking for Custom Highlighting guidance

2011-06-30 Thread Jamie Johnson
Thanks for the suggestion Mike, I will give that a shot.  Having no
familiarity with FastVectorHighlighter is there somewhere specific I
should be looking?

On Wed, Jun 29, 2011 at 3:20 PM, Mike Sokolov soko...@ifactory.com wrote:

 Does the phonetic analysis preserve the offsets of the original text field?

 If so, you should probably be able to hack up FastVectorHighlighter to do 
 what you want.

 -Mike

 On 06/29/2011 02:22 PM, Jamie Johnson wrote:

 I have a schema with a text field and a text_phonetic field and would like
 to perform highlighting on them in such a way that the tokens that match are
 combined.  What would be a reasonable way to accomplish this?




Re: Looking for Custom Highlighting guidance

2011-06-30 Thread Mike Sokolov
It's going to be a bit complicated, but I would start by looking at 
providing a facility for merging an array of FieldTermStacks. The 
constructor for FieldTermStack() takes a fieldName and builds up a list 
of TermInfos (terms with positions and offsets): I *think* that if you 
make two of these, merge them, and hand that to the FieldPhraseList 
constructor (this is done in the main FVH class), you should get what 
you want.  This is a bit speculative; I haven't tried it.


-Mike

On 06/30/2011 08:26 AM, Jamie Johnson wrote:

Thanks for the suggestion Mike, I will give that a shot.  Having no
familiarity with FastVectorHighlighter is there somewhere specific I
should be looking?

On Wed, Jun 29, 2011 at 3:20 PM, Mike Sokolovsoko...@ifactory.com  wrote:
   

Does the phonetic analysis preserve the offsets of the original text field?

If so, you should probably be able to hack up FastVectorHighlighter to do what 
you want.

-Mike

On 06/29/2011 02:22 PM, Jamie Johnson wrote:
 

I have a schema with a text field and a text_phonetic field and would like
to perform highlighting on them in such a way that the tokens that match are
combined.  What would be a reasonable way to accomplish this?


   


Re: AW: Adding german phonetic to solr

2011-06-30 Thread Paul Libbrecht
Jürgen,

clearly the Cologne-phonetic was not yet supported, please read:

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/PhoneticFilterFactory.java

one would need  to add the line about Cologne-phonetic and recompile.

It'd make sense to open a jira issue for this.

paul

Le 30 juin 2011 à 14:24, Jürgen Tiedemann a écrit :

 Hi Paul,
 
 thanks for the quick reply. I replaced commons-codec-1.4.jar with 
 commons-codec-1.5.jar to get the ColognePhonetic. In schema.xml I added
 
 filter class=solr.PhoneticFilterFactory encoder=ColognePhonetic 
 inject=true/
 
 but then I get
 
 org.apache.solr.common.SolrException: Unknown encoder: ColognePhonetic 
 [[CAVERPHONE, SOUNDEX, METAPHONE, DOUBLEMETAPHONE, REFINEDSOUNDEX]].
 
 How do I get PhoneticFilterFactory to know ColognePhonetic? Or is my approach 
 completely wrong?
 
 Jürgen
 
 
 
 
 
 
 
 Von: Paul Libbrecht p...@hoplahup.net
 An: solr-user@lucene.apache.org
 Gesendet: Donnerstag, den 30. Juni 2011, 12:09:18 Uhr
 Betreff: Re: Adding german phonetic to solr
 
 Jürgen,
 
 I haven't had the time to deploy it but i heard about Kölner Phonetik that 
 was 
 to be contributed as part of apache-commons-codec.
 It probably still is just a patch in a jira issue.
https://issues.apache.org/jira/browse/CODEC-106
 The contribution was posted to commons-dev on september 15th 2010.
 
 Bringing this reachable into Solr would be interesting but it's a bit of work.
 
 We have used the Double-Metaphone indexer with Lucene with reasonable success 
 in 
 ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess 
 is 
 really a desirable feature of a phonetic environment.
 You might want to also care for all the proper nouns around for which 
 tradition phonetics is doomed to fail if, at least, your texts are a bit with 
 international names!
 
 paul
 
 
 Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit :
 
 Hi all,
 
 does solar support german phonetic? Searching for how to add german 
 phonetic 
 to 
 
 solr on google does not deliver good results, just lots of JIRA stuff. I 
 searched for cologne phonetic too. The wikis 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
 y
 and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also 
 answered 
 my question. Please, can someone tell me how to do it or where to look for 
 appropriate information.
 
 Nice regards
 
 Jürgen



Re: How to optimize solr indexes

2011-06-30 Thread Ahmet Arslan


--- On Thu, 6/30/11, Romi romijain3...@gmail.com wrote:

 From: Romi romijain3...@gmail.com
 Subject: Re: How to optimize solr indexes
 To: solr-user@lucene.apache.org
 Date: Thursday, June 30, 2011, 3:01 PM
 and if i want to set it as
 optimize=false then what i need to do ??

When calling import, use dataimport?command=delta-importoptimize=false

See other command available, like clean, commit, entity, etc.
http://wiki.apache.org/solr/DataImportHandler#Commands


Re: Multicore clustering setup problem

2011-06-30 Thread Walter Closenfleight
Sure, thanks for having a look!

By the way, if I attempt to hit a solr URL, I get this error, followed by
the stacktrace. If I set abortOnConfigurationError to false (I've found you
must put the setting in both solr.xml and solrconfig.xml for both cores
otherwise you keep getting the error), then the main URL to solr (
http://localhost/solr) lists just the first core.

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml
-
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent' at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
at

*Tomcat Log:*

INFO: [core1] Added SolrEventListener:
org.apache.solr.core.QuerySenderListener{queries=[{q=solr
rocks,start=0,rows=10}, {q=static firstSearcher warming query from
solrconfig.xml}]}
Jun 30, 2011 8:51:23 AM org.apache.solr.request.XSLTResponseWriter init
INFO: xsltCacheLifetimeSeconds=5
Jun 30, 2011 8:51:23 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent'
 at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
 at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:551)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
 at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
 at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
 at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
 at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
 at
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
 at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
 at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
 at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
 at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
 at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630)
 at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556)
 at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491)
 at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
 at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
 at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
 at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
 at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
 at org.apache.catalina.core.StandardService.start(StandardService.java:516)
 at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
 at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
 at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.clustering.ClusteringComponent
 at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:592)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
 at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247)
 at

Re: Taxonomy faceting

2011-06-30 Thread Toke Eskildsen
On Thu, 2011-06-30 at 11:38 +0200, Russell B wrote:
 a multivalued field labelled category which for each document defines
 where in the tree it should appear.  For example: doc1 has the
 category field set to 0/topics, 1/topics/computing,
 2/topic/computing/systems.
 
 I then facet on the 'category' field, filter the results with fq={!raw
 f=category}1/topics/computing to get everything below that point on the
 tree, and use f.category.facet.prefix to restrict the facet fields to the
 current level.

Lucid Imagination did a webcast on this, as far as I remember?

 Playing around with the results, it seems to work ok but despite reading
 lots about faceting I can't help feel there might be a better solution.

The '1/topics/computing'-solution works at a single level, so if you are
interested in a multi-level result like
- topic
 - computing
  - hardware
  - software
 - biology
  - plants
  - animals
you have to do more requests.

 Are there better ways to achieve this?

Taxonomy faceting is a bit of a mess right now, but it is also an area
where a lot is happening. For SOLR, there is

https://issues.apache.org/jira/browse/SOLR-64
(single path/document hierarchical faceting)

https://issues.apache.org/jira/browse/SOLR-792
(pivot faceting, now part of trunk AFAIR)

https://issues.apache.org/jira/browse/SOLR-2412
(multi path/document hierarchical faceting, very experimental)

Just yesterday, another multi path/document hierarchical faceting
solution was added to the Lucene 3.x branch and Lucene trunk. It has
been used by IBM for some time and appears to be mature and stable.
https://issues.apache.org/jira/browse/LUCENE-3079
However, this solution requires a sidecar index for the taxonomy and I
am a bit worried about how this fits into the Solr index workflow.



Re: Text field case sensitivity problem

2011-06-30 Thread Jamie Johnson
I'm not familiar with the CharFilters, I'll look into those now.

Is the solr.LowerCaseFilterFactory not handling wildcards the expected
result or is this a bug?

On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov soko...@ifactory.com wrote:
 I wonder whether CharFilters are applied to wildcard terms?  I suspect they
 might be.  If that's the case, you could use the MappingCharFilter to
 perform lowercasing (and strip diacritics too if you want that)

 -Mike

 On 06/15/2011 10:12 AM, Jamie Johnson wrote:

 So simply lower casing the works but can get complex.  The query that I'm
 executing may have things like ranges which require some words to be upper
 case (i.e. TO).  I think this would be much better solved on Solrs end, is
 there a JIRA about this?

 On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote:

 opps, please s/Highlight/Wildcard/

 On 06/14/2011 05:31 PM, Mike Sokolov wrote:

 Wildcard queries aren't analyzed, I think?  I'm not completely sure what
 the best workaround is here: perhaps simply lowercasing the query terms
 yourself in the application.  Also - I hope someone more knowledgeable will
 say that the new HighlightQuery in trunk doesn't have this restriction, but
 I'm not sure about that.

 -Mike

 On 06/14/2011 05:13 PM, Jamie Johnson wrote:

 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
  wrote:

 I am using the following for my text field:

 fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
         --
 !-- Case insensitive stop word removal.
           add enablePositionIncrements=true in both the index and query
           analyzers to leave a 'gap' for more accurate phrase queries.
         --
 filter class=solr.StopFilterFactory
                 ignoreCase=true
                 words=stopwords.txt
                 enablePositionIncrements=true
                 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
                 ignoreCase=true
                 words=stopwords.txt
                 enablePositionIncrements=true
                 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 /fieldType

 I have a field defined as
 field name=Person_Name type=text stored=true indexed=true /

 when I execute a go to the following url I get results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
 but if I do
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
 I get nothing.  I thought the LowerCaseFilterFactory would have handled
 lowercasing both the query and what is being indexed, am I missing
 something?





Re: Text field case sensitivity problem

2011-06-30 Thread Jamie Johnson
I think my answer is here...

On wildcard and fuzzy searches, no text analysis is performed on the
search word. 

taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers


On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnson jej2...@gmail.com wrote:
 I'm not familiar with the CharFilters, I'll look into those now.

 Is the solr.LowerCaseFilterFactory not handling wildcards the expected
 result or is this a bug?

 On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov soko...@ifactory.com wrote:
 I wonder whether CharFilters are applied to wildcard terms?  I suspect they
 might be.  If that's the case, you could use the MappingCharFilter to
 perform lowercasing (and strip diacritics too if you want that)

 -Mike

 On 06/15/2011 10:12 AM, Jamie Johnson wrote:

 So simply lower casing the works but can get complex.  The query that I'm
 executing may have things like ranges which require some words to be upper
 case (i.e. TO).  I think this would be much better solved on Solrs end, is
 there a JIRA about this?

 On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote:

 opps, please s/Highlight/Wildcard/

 On 06/14/2011 05:31 PM, Mike Sokolov wrote:

 Wildcard queries aren't analyzed, I think?  I'm not completely sure what
 the best workaround is here: perhaps simply lowercasing the query terms
 yourself in the application.  Also - I hope someone more knowledgeable will
 say that the new HighlightQuery in trunk doesn't have this restriction, but
 I'm not sure about that.

 -Mike

 On 06/14/2011 05:13 PM, Jamie Johnson wrote:

 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
  wrote:

 I am using the following for my text field:

 fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
         --
 !-- Case insensitive stop word removal.
           add enablePositionIncrements=true in both the index and query
           analyzers to leave a 'gap' for more accurate phrase queries.
         --
 filter class=solr.StopFilterFactory
                 ignoreCase=true
                 words=stopwords.txt
                 enablePositionIncrements=true
                 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
                 ignoreCase=true
                 words=stopwords.txt
                 enablePositionIncrements=true
                 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 /fieldType

 I have a field defined as
 field name=Person_Name type=text stored=true indexed=true /

 when I execute a go to the following url I get results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
 but if I do
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
 I get nothing.  I thought the LowerCaseFilterFactory would have handled
 lowercasing both the query and what is being indexed, am I missing
 something?






Returning total matched document count with SolrJ

2011-06-30 Thread Kissue Kissue
Hi,

I am using Solr 3.1 and using the SolrJ client. Does anyone know how i can
get the *TOTAL* number of matched documents returned with the QueryResponse?
I am interested in the total documents matched not just the result returned
with the limit applied. Any help will be appreciated.

Thanks.


RE: Returning total matched document count with SolrJ

2011-06-30 Thread Michael Ryan
SolrDocumentList docs = queryResponse.getResults();
long totalMatches = docs.getNumFound();

-Michael


Problems with SolrCloud

2011-06-30 Thread Andrey Sapegin
Dear ladies and gentlemen.

Can I ask you to help me with SolrCloud

1) I try to setup a SolrCloud on 2 computers with 3 Zookepers, but it
fails:(

I need to set Zookeper port to 8001, so I change clientPort=8001 in
solr/zoo.cfg.

When I try the command from the example C, to run shard1, it works:
java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf
-DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900  -jar
start.jar

But if I change it to and try to run shard1:
java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf
-DzkRun -DzkHost=localhost:8001,localhost:8004 -jar start.jar

it fails with the following message:
SEVERE: java.lang.IllegalArgumentException: solr/zoo_data/myid file is
missing

2) to solve it I tried to set
*-Dsolr.solr.home=/data/a.sapegin/SolrCloud/shard1*
(without any slashes in the end)

But then I receive another exception:
Caused by:
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException:
Error processing /data/a.sapegin/SolrCloud/shard1//zoo.cfg

I think this // is a bug.


Could you please help?
Thank You in advance,
Kind Regards,

-- 

Andrey Sapegin,
Software Developer,

Unister GmbH
Dittrichring 18-20 | 04109 Leipzig

+49 (0)341 492885069,
+4915778339304,
andrey.sape...@unister-gmbh.de

www.unister.de



Re: Text field case sensitivity problem

2011-06-30 Thread Mike Sokolov
Yes, after posting that response, I read some more and came to the same 
conclusion... there seems to be some interest on the dev list in 
building a capability to specify an analysis chain for use with wildcard 
and related queries, but it doesn't exist now.


-Mike

On 06/30/2011 10:34 AM, Jamie Johnson wrote:

I think my answer is here...

On wildcard and fuzzy searches, no text analysis is performed on the
search word. 

taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers


On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnsonjej2...@gmail.com  wrote:
   

I'm not familiar with the CharFilters, I'll look into those now.

Is the solr.LowerCaseFilterFactory not handling wildcards the expected
result or is this a bug?

On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolovsoko...@ifactory.com  wrote:
 

I wonder whether CharFilters are applied to wildcard terms?  I suspect they
might be.  If that's the case, you could use the MappingCharFilter to
perform lowercasing (and strip diacritics too if you want that)

-Mike

On 06/15/2011 10:12 AM, Jamie Johnson wrote:

So simply lower casing the works but can get complex.  The query that I'm
executing may have things like ranges which require some words to be upper
case (i.e. TO).  I think this would be much better solved on Solrs end, is
there a JIRA about this?

On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolovsoko...@ifactory.com  wrote:
   

opps, please s/Highlight/Wildcard/

On 06/14/2011 05:31 PM, Mike Sokolov wrote:
 

Wildcard queries aren't analyzed, I think?  I'm not completely sure what
the best workaround is here: perhaps simply lowercasing the query terms
yourself in the application.  Also - I hope someone more knowledgeable will
say that the new HighlightQuery in trunk doesn't have this restriction, but
I'm not sure about that.

-Mike

On 06/14/2011 05:13 PM, Jamie Johnson wrote:
   

Also of interest to me is this returns results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
  wrote:

 

I am using the following for my text field:

fieldType name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
!-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and query
   analyzers to leave a 'gap' for more accurate phrase queries.
 --
filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

I have a field defined as
field name=Person_Name type=text stored=true indexed=true /

when I execute a go to the following url I get results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
but if I do
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory would have handled
lowercasing both the query and what is being indexed, am I missing
something?

   


   
 


Re: MergerFacor effect on indexes

2011-06-30 Thread Tomás Fernández Löbbe
Hi Romi, after doing the changes, to se the impact you'll have to index some
documents, Solr won't change your index unless you add more documents and
commit them.
It looks like your maxMergeDocs parameter is too small, I would use a grater
value here.
You can see an good explanation on how the merge policy works in Solr here:

http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/

The default Merge policy has changed in 3_x and trunk, you can probably also
take a look at

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Regards,

Tomás

On Thu, Jun 30, 2011 at 6:47 AM, Romi romijain3...@gmail.com wrote:

 my solrconfig.xml configuration is as :
 mainIndex
   useCompoundFilefalse/useCompoundFile
ramBufferSizeMB32/ramBufferSizeMB
mergeFactor5/mergeFactor
maxMergeDocs10/maxMergeDocs
maxFieldLength1/maxFieldLength
unlockOnStartupfalse/unlockOnStartup
  /mainIndex



 my solrconfig.xml configuration is as :

 *mainIndex
   useCompoundFilefalse/useCompoundFile
ramBufferSizeMB32/ramBufferSizeMB
mergeFactor5/mergeFactor
maxMergeDocs10/maxMergeDocs
maxFieldLength1/maxFieldLength
unlockOnStartupfalse/unlockOnStartup
  /mainIndex*

 and index size is 12mb. but when i change my mergeFactor i am not finding
 any effect in my indexes., ie. the no of segments are exactly same. i am
 not
 getting which configuration will effect the no of segments. as i suppose it
 is mergefactor. and my next problem is which configuration defines the
 number of docs per segments and what will be the size of this segment so
 that next segments will be created

 please make me clear about these points


 -
 Thanks  Regards
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/MergerFacor-effect-on-indexes-tp3125146p3125146.html
 Sent from the Solr - User mailing list archive at Nabble.com.



token exceeding provided text size error since Solr 3.2

2011-06-30 Thread getagrip

A bug was introduced between Solr 3.1 and 3.2.

With Solr 3.2 we are now getting the follwing error when querying 
several pdf and word documents:


SEVERE: org.apache.solr.common.SolrException: 
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token 
17 exceeds length of provided text sized 168
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:474)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: 
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token 
17 exceeds length of provided text sized 168
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:467)

... 24 more




Re: Text field case sensitivity problem

2011-06-30 Thread Erik Hatcher
Jamie - there is a JIRA about this, at least one: 
https://issues.apache.org/jira/browse/SOLR-218

Erik
 
On Jun 15, 2011, at 10:12 , Jamie Johnson wrote:

 So simply lower casing the works but can get complex.  The query that I'm
 executing may have things like ranges which require some words to be upper
 case (i.e. TO).  I think this would be much better solved on Solrs end, is
 there a JIRA about this?
 
 On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote:
 
 opps, please s/Highlight/Wildcard/
 
 
 On 06/14/2011 05:31 PM, Mike Sokolov wrote:
 
 Wildcard queries aren't analyzed, I think?  I'm not completely sure what
 the best workaround is here: perhaps simply lowercasing the query terms
 yourself in the application.  Also - I hope someone more knowledgeable will
 say that the new HighlightQuery in trunk doesn't have this restriction, but
 I'm not sure about that.
 
 -Mike
 
 On 06/14/2011 05:13 PM, Jamie Johnson wrote:
 
 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine
 
 
 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
 wrote:
 
 I am using the following for my text field:
 
 fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
 !-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
 filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 /fieldType
 
 I have a field defined as
 field name=Person_Name type=text stored=true indexed=true /
 
 when I execute a go to the following url I get results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
 but if I do
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
 I get nothing.  I thought the LowerCaseFilterFactory would have handled
 lowercasing both the query and what is being indexed, am I missing
 something?
 
 



Re: Returning total matched document count with SolrJ

2011-06-30 Thread Kissue Kissue
Thanks Michael. Quite helpful.

On Thu, Jun 30, 2011 at 4:06 PM, Michael Ryan mr...@moreover.com wrote:

 SolrDocumentList docs = queryResponse.getResults();
 long totalMatches = docs.getNumFound();

 -Michael



Re: Strip Punctuation From Field

2011-06-30 Thread Tomás Fernández Löbbe
Not that I'm aware of. This is probably something you want to do at the
application layer. If you want to do it in Solr, a good place would be an
UpdateRequestProcessor, but I guess you'll have to implement your own.

On Wed, Jun 29, 2011 at 4:12 PM, Curtis Wilde galv...@gmail.com wrote:

 From all I've read, using something like PatternReplaceFilterFactory allows
 you to replace / remove text in an index, but is there anything similar
 that
 allows manipulation of the text in the associated field? For example, if I
 pulled a status from Twitter like, Hi, this is a #hashtag. I would like
 to
 remove the # from that string and use it for both the index, and also the
 field value that is returned from a query, i.e., Hi, this is a hashtag.



Re: Text field case sensitivity problem

2011-06-30 Thread Mike Sokolov

Yes, and this too: https://issues.apache.org/jira/browse/SOLR-219

On 06/30/2011 12:46 PM, Erik Hatcher wrote:

Jamie - there is a JIRA about this, at least 
one:https://issues.apache.org/jira/browse/SOLR-218

Erik

On Jun 15, 2011, at 10:12 , Jamie Johnson wrote:

   

So simply lower casing the works but can get complex.  The query that I'm
executing may have things like ranges which require some words to be upper
case (i.e. TO).  I think this would be much better solved on Solrs end, is
there a JIRA about this?

On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolovsoko...@ifactory.com  wrote:

 

opps, please s/Highlight/Wildcard/


On 06/14/2011 05:31 PM, Mike Sokolov wrote:

   

Wildcard queries aren't analyzed, I think?  I'm not completely sure what
the best workaround is here: perhaps simply lowercasing the query terms
yourself in the application.  Also - I hope someone more knowledgeable will
say that the new HighlightQuery in trunk doesn't have this restriction, but
I'm not sure about that.

-Mike

On 06/14/2011 05:13 PM, Jamie Johnson wrote:

 

Also of interest to me is this returns results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com
wrote:

I am using the following for my text field:
   

fieldType name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

I have a field defined as
field name=Person_Name type=text stored=true indexed=true /

when I execute a go to the following url I get results
http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
but if I do
http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory would have handled
lowercasing both the query and what is being indexed, am I missing
something?


 
   


Wildcard search not working if full word is queried

2011-06-30 Thread Celso Pinto
Hi everyone,

I'm having some trouble figuring out why a query with an exact word
followed by the * wildcard, eg. teste*, returns no results while a
query for test* returns results that have the word teste in them.

I've created a couple of pasties:

Exact word with wildcard : http://pastebin.com/n9SMNsH0
Similar word: http://pastebin.com/jQ56Ww6b

Parameters other than title, description and content have no effect
other than filtering out unwanted results. In a two of the four
results, the title has the complete word teste. On the other two,
the word appears in the other fields.

Does anyone have any insights about what I'm doing wrong?

Thanks in advance.

Regards,
Celso


Re: Multicore clustering setup problem

2011-06-30 Thread Stanislaw Osinski
It looks like the whole clustering component JAR is not in the classpath. I
remember that I once dealt with a similar issue in Solr 1.4 and the cause
was the relative path of the lib tag being resolved against the core's
instanceDir, which made the path incorrect when directly copying and pasting
from the single core configuration. Try correcting the relative lib paths
or replacing them with absolute ones, it should solve the problem.

Cheers,

Staszek


Re: Wildcard search not working if full word is queried

2011-06-30 Thread François Schiettecatte
I would run that word through the analyzer, I suspect that the word 'teste' is 
being stemmed to 'test' in the index, at least that is the first place I would 
check.

François

On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote:

 Hi everyone,
 
 I'm having some trouble figuring out why a query with an exact word
 followed by the * wildcard, eg. teste*, returns no results while a
 query for test* returns results that have the word teste in them.
 
 I've created a couple of pasties:
 
 Exact word with wildcard : http://pastebin.com/n9SMNsH0
 Similar word: http://pastebin.com/jQ56Ww6b
 
 Parameters other than title, description and content have no effect
 other than filtering out unwanted results. In a two of the four
 results, the title has the complete word teste. On the other two,
 the word appears in the other fields.
 
 Does anyone have any insights about what I'm doing wrong?
 
 Thanks in advance.
 
 Regards,
 Celso



Core Administration

2011-06-30 Thread zarni aung
Hi,

I am researching about core administration using Solr.  My requirement is to
be able to provision/create/delete indexes dynamically.  I have tried it and
it works.  Apparently core admin handler will create a new core by
specifying the instance Directory (required), along with data directory, and
so on.  The issue I'm having is that a separate app that lives on a
different machine need to create these new cores on demand along with
creating new schema.xml and data directories.  The required instance
directory, data directory and others need to be separate from each core.

My first approach is to write a tool that would take additional params that
can code gen the schema config files and so on based on different type of
documents.  ie: Homes, People, etc...

But I need to know if Solr already handles that case.  I wouldn't want to
have to write the tool if Solr already supports creating cores with new
configs on the fly.

Thanks,

Z


Solr Importing database field issues . how to I use postgres pgpool connection?

2011-06-30 Thread rsaravanakumar
I am using postgres database and pgpool . Postgres database port : 5432 is
woking fine. But 
I am using Pgpool port :  is Not Working.

MY importing xml file (*myproduct.xml*)
*Working *
dataSource name=jdbc driver=org.postgresql.Driver
url=jdbc:postgresql://localhost:5432/x
user= x  password=x readOnly=true 
autoCommit=false /

*Not Working *
dataSource name=jdbc driver=org.postgresql.Driver
url=jdbc:postgresql://localhost:/x
user= x  password=x readOnly=true 
autoCommit=false /

It is pgpool problem or solr problem? please any onle let me know the issues
and
How to I salve pgpool this problem?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Importing-database-field-issues-how-to-I-use-postgres-pgpool-connection-tp3126212p3126212.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problems with SolrCloud

2011-06-30 Thread Andrey Sapegin
Dear ladies and gentlemen.

Can I ask you to help me with SolrCloud

1) I try to setup a SolrCloud on 2 computers with 3 Zookepers, but it
fails:(

I need to set Zookeper port to 8001, so I change clientPort=8001 in
solr/zoo.cfg.

When I try the command from the example C, to run shard1, it works:
java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf
-DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900  -jar
start.jar

But if I change it to and try to run shard1:
java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf
-DzkRun -DzkHost=localhost:8001,localhost:8004 -jar start.jar

it fails with the following message:
SEVERE: java.lang.IllegalArgumentException: solr/zoo_data/myid file is
missing

2) to solve it I tried to set
*-Dsolr.solr.home=/data/a.sapegin/SolrCloud/shard1*
(without any slashes in the end)

But then I receive another exception:
Caused by:
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException:
Error processing /data/a.sapegin/SolrCloud/shard1//zoo.cfg

I think this // is a bug.


Could you please help?
Thank You in advance,
Kind Regards,

-- 

Andrey Sapegin,
Software Developer,

Unister GmbH
Dittrichring 18-20 | 04109 Leipzig

+49 (0)341 492885069,
+4915778339304,
andrey.sape...@unister-gmbh.de

www.unister.de



Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-30 Thread Shawn Heisey

On 6/29/2011 10:16 PM, Shawn Heisey wrote:
I was thinking perhaps I might actually decrease the termIndexInterval 
value below the default of 128.  I know from reading the Hathi Trust 
blog that memory usage for the tii file is much more than the size of 
the file would indicate, but if I increase it from 13MB to 26MB, it 
probably would still be OK.


Decreasing the termIndexInterval to 64 almost doubled the tii file size, 
as expected.  It made the filterCache warming much faster, but made the 
queryResultCache warming very very slow.  Regular queries also seem like 
they're slower.


I am trying again with 256.  I may go back to the default before I'm 
done.  I'm guessing that a lot of trial and error was put into choosing 
the default value.


It's been fun having a newer index available on my backup servers.  I've 
been able to do a lot of trials, learned a lot of things that don't work 
and a few that do.  I might do some experiments with trunk once I've 
moved off 1.4.1.


Thanks,
Shawn



Re: Core Administration

2011-06-30 Thread zarni aung
I have an idea.  I  believe I can discover the Properties of an object (C#
reflection) and then code gen schema.xml file based on the field type and
other meta data of that type (possibly from database).  After that, I should
be able to ftp the files over to the solr machine.  Then I can invoke core
admin to create the new index on the fly.  My original question would be, is
there a tool that already does what I'm describing?

Z

On Thu, Jun 30, 2011 at 2:32 PM, zarni aung zau...@gmail.com wrote:

 Hi,

 I am researching about core administration using Solr.  My requirement is
 to be able to provision/create/delete indexes dynamically.  I have tried it
 and it works.  Apparently core admin handler will create a new core by
 specifying the instance Directory (required), along with data directory, and
 so on.  The issue I'm having is that a separate app that lives on a
 different machine need to create these new cores on demand along with
 creating new schema.xml and data directories.  The required instance
 directory, data directory and others need to be separate from each core.

 My first approach is to write a tool that would take additional params that
 can code gen the schema config files and so on based on different type of
 documents.  ie: Homes, People, etc...

 But I need to know if Solr already handles that case.  I wouldn't want to
 have to write the tool if Solr already supports creating cores with new
 configs on the fly.

 Thanks,

 Z



Re: Multicore clustering setup problem

2011-06-30 Thread Walter Closenfleight
Staszek,

That makes sense, but this has always been a multi-core setup, so the paths
have not changed, and the clustering component worked fine for core0. The
only thing new is I have fine tuned core1 (to begin implementing it).
Previously the solrconfig.xml file was very basic. I replaced it with
core0's solrconfig.xml and made very minor changes to it (unrelated to
clustering) - it's a nearly identical solrconfig.xml file so I'm surprised
it doesn't work for core1.

In other words, the paths here are the same for core0 and core1:
  lib dir=../../dist/ regex=apache-solr-cell-\d.*\.jar /
  lib dir=../../dist/ regex=apache-solr-clustering-\d.*\.jar /
  lib dir=../../contrib/clustering/lib/downloads/ /
  lib dir=../../contrib/clustering/lib/ /
Again, I'm wondering if perhaps since both cores have the clustering
component, if it should have a shared configuration in a different file used
by both cores(?). Perhaps the duplicate clusteringComponent configuration
for both cores is the problem?

Thanks for looking at this!

On Thu, Jun 30, 2011 at 1:29 PM, Stanislaw Osinski 
stanislaw.osin...@carrotsearch.com wrote:

 It looks like the whole clustering component JAR is not in the classpath. I
 remember that I once dealt with a similar issue in Solr 1.4 and the cause
 was the relative path of the lib tag being resolved against the core's
 instanceDir, which made the path incorrect when directly copying and
 pasting
 from the single core configuration. Try correcting the relative lib paths
 or replacing them with absolute ones, it should solve the problem.

 Cheers,

 Staszek



Re: Core Administration

2011-06-30 Thread Stefan Matheis

Zarni,

Am 30.06.2011 20:32, schrieb zarni aung:

But I need to know if Solr already handles that case.  I wouldn't want to
have to write the tool if Solr already supports creating cores with new
configs on the fly.


there isn't. you have to create the directory structure  the related 
files yourself. solr (the AdminCoreHandler) does only activate the 
core for usage.


Few Weeks ago, there was a Question about modifying Configuration Files 
from the Browser: 
http://search.lucidimagination.com/search/document/ec79172e7613d1a/modifying_configuration_from_a_browser


Regards
Stefan


Re: Core Administration

2011-06-30 Thread zarni aung
Thank you very much Stefan.  This helps.

Zarni

On Thu, Jun 30, 2011 at 4:10 PM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Zarni,

 Am 30.06.2011 20:32, schrieb zarni aung:

  But I need to know if Solr already handles that case.  I wouldn't want to
 have to write the tool if Solr already supports creating cores with new
 configs on the fly.


 there isn't. you have to create the directory structure  the related files
 yourself. solr (the AdminCoreHandler) does only activate the core for
 usage.

 Few Weeks ago, there was a Question about modifying Configuration Files
 from the Browser: http://search.**lucidimagination.com/search/**
 document/ec79172e7613d1a/**modifying_configuration_from_**a_browserhttp://search.lucidimagination.com/search/document/ec79172e7613d1a/modifying_configuration_from_a_browser

 Regards
 Stefan



Re: TermVectors and custom queries

2011-06-30 Thread Jamie Johnson
Perhaps a better question, is this possible?

On Mon, Jun 27, 2011 at 5:15 PM, Jamie Johnson jej2...@gmail.com wrote:
 I have a field named content with the following definition

    field name=content type=text indexed=true stored=true
 multiValued=true termVectors=true termPositions=true
 termOffsets=true/

 I'm now trying to execute a query against content and get back the term
 vectors for the pieces that matched my query, but I must be messing
 something up.  My query is as follows:

 http://localhost:8983/solr/select/?qt=tvrhq=content:testfl=contenttv.all=true

 where the word test is in my content field.  When I get information back
 though I am getting the term vectors for all of the tokens in that field.
 How do I get back just the ones that match my search?



Re: After the query component has the results, can I do more filtering on them?

2011-06-30 Thread arian487
unfortunately the userIdsToScore updates very often.  I'd get more Ids almost
every single query (hence why I made the new component).  But I see the
problem of not being able to score the whole resultSet.  I'd actually need
to do this now that I think about it.  I want to get a whole whack of users
(lets say 10,000), score them using my system, and then 'remember' the top
3500 of these users in the result cache or something.  

How would I go about operating on the whole resultSet rather then just the
'rows' I set.  I wonder if I can set rows to be really large, score them in
the component, and then remember all of these results in the result cache
and then dynamically change rows in my component so not all 3500 (or w/e
number I choose) are returned.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3127560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: After the query component has the results, can I do more filtering on them?

2011-06-30 Thread arian487
Sorry for the double post but in this case, is it possible for me to access
the queryResultCache in my component and play with it?  Ideally what I want
is this:

1) I have 1 (just a random large number) total results. 
2) In my component I access all of these results, score them, and take the
top 3500 (a random smaller number) and drop the rest.  
3) The 3500 I have now should end up going into the queryResultCache and
essentially replacing the other one.
4) The number returned to the user should then be rows and subsequent
queries which are the same just gets them from my new result cache.

I'm pretty noob about all if this so I'm hoping someone can help.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3127581.html
Sent from the Solr - User mailing list archive at Nabble.com.


JOIN, query on the parent?

2011-06-30 Thread Ryan McKinley
Hello-

I'm looking for a way to find all the links from a set of results.  Consider:

doc
 id:1
 type:X
 link:a
 link:b
/doc

doc
 id:2
 type:X
 link:a
 link:c
/doc

doc
 id:3
 type:Y
 link:a
/doc

Is there a way to search for all the links from stuff of type X -- in
this case (a,b,c)

If I'm understanding the {!join stuff, it lets you search on the
children, but i don't really see how to limit the parent values.

Am I missing something, or is this a further extension to the JoinQParser?


thanks
ryan


Re: Taxonomy faceting

2011-06-30 Thread Chris Hostetter

: Lucid Imagination did a webcast on this, as far as I remember?

that was me ... the webcast was a pre-run of my apachecon talk...

http://www.lucidimagination.com/why-lucid/webinars/mastering-power-faceted-search
http://people.apache.org/~hossman/apachecon2010/facets/

...taxonomy stuff comes up ~slide 30

: The '1/topics/computing'-solution works at a single level, so if you are
: interested in a multi-level result like

if you want to show the whole tree when facetig you can just leave the 
depth number prefix out of terms, thta should work fine (but i haven't 
though about hard)

:  Are there better ways to achieve this?
: 
: Taxonomy faceting is a bit of a mess right now, but it is also an area
: where a lot is happening. For SOLR, there is

right, some of which i havne't been able to keep up on and can't comment 
on -- but in my experience if you are serious organizing your data in a 
taxonomy then you probably already have some data structure in your 
application layer that models the whole thing in memory, and maps nodeIds 
to nodeLabels and what not.  What usually works fine is to just index the 
nodeIds for the entire ancestory of the category each Document is in can 
work fine for the filtering (ie: fq=cat:1234), and to generate the facet 
presentation you do a simple facet.field=ancestorCategoriesfacet.limit=-1 
to get all the counts in a big hashmap and then use that to annotate your 
own own category tree data structure that you use to generate the 
presentaiton.



-Hoss


Uninstall Solr

2011-06-30 Thread GAURAV PAREEK
Hi All,

How to *uninstall* Solr completely ?

Any help will be appreciated.

Regards,
Gaurav


Re: Uninstall Solr

2011-06-30 Thread Erik Hatcher
How'd you install it?

Generally you just delete the directory where you installed it.  But you 
might be deploying solr.war in a container somewhere besides Solr's example 
Jetty setup, in which case you need to undeploy it from those other containers 
and remove the remnants.

Curious though... why uninstall it?  Solr makes a mighty fine hammer to have 
around :)

Erik

On Jun 30, 2011, at 19:49 , GAURAV PAREEK wrote:

 Hi All,
 
 How to *uninstall* Solr completely ?
 
 Any help will be appreciated.
 
 Regards,
 Gaurav