date:20150310

Solr phonetics with spelling

2015-03-10 Thread Ashish Mukherjee

Hello,

Couple of questions related to phonetics -

1. If I enable the phonetic filter in managed-schema file for a particular
field, how does it affect the spell handler?

2. What is the meaning of the inject attribute within analyzer in
managed-schema? The documentation is not very clear about it.

Regards,
Ashish

Re: SolrCloud: Chroot error

2015-03-10 Thread Aman Tandon

Thanks Shawn, I tried it with single string but still no success.

So currently i am running it without chroot and it is working fine.

With Regards
Aman Tandon

On Mon, Mar 9, 2015 at 9:46 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/9/2015 10:03 AM, Aman Tandon wrote:
  Thanks for replying, Just to send the mail, I replaced the IP addresses
  with the imaginary hostname, now the command is
 
  *./solr start -c -z localhost:2181,abc.com:2181
  http://abc.com:2181,xyz.com:2181/home/aman/solrcloud/solr_zoo
  http://xyz.com:2181/home/aman/solrcloud/solr_zoo -p 4567*

 The same URL replacement is still happening.  I think I know what you
 are doing, but I was hoping to have a clean string just to make sure.

 You should not be using localhost in the zkHost string unless there is
 only one zk server, or you are trying to start the entire cluster on one
 machine.  All of your Solr machines should have identical zkHost
 parameters.  That is not possible if they are separate machines and you
 use localhost.

 Your chroot should be very simple, as I mentioned in the other email.
 Using /solr is appropriate if you won't be sharing the zookeeper
 ensemble with multiple SolrCloud clusters.  The filesystem layout of
 your zookeeper install (bin, data, logs, etc) is NOT relevant for this
 chroot.  It exists only within the zookeeper database.

 Thanks,
 Shawn

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-03-10 Thread Dmitry Kan

For the sake of the story completeness, just wanted to confirm these params
made a positive affect:

-Dsolr.solr.home=cores -Xmx12000m -Djava.awt.headless=true -XX:+UseParNewGC
-XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC
-XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40

This freed up couple dozen GBs on the solr server!

On Tue, Feb 17, 2015 at 1:47 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Thanks Toke!

 Now I consistently see the saw-tooth pattern on two shards with new GC
 parameters, next I will try your suggestion.

 The current params are:

 -Xmx25600m -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent
 -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=8
 -XX:CMSInitiatingOccupancyFraction=40

 Dmitry

 On Tue, Feb 17, 2015 at 1:34 PM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:

 On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote:
  Solr: 4.10.2 (high load, mass indexing)
  Java: 1.7.0_76 (Oracle)
  -Xmx25600m
 
 
  Solr: 4.3.1 (normal load, no mass indexing)
  Java: 1.7.0_11 (Oracle)
  -Xmx25600m
 
  The RAM consumption remained the same after the load has stopped on the
  4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
  jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM
 as
  seen by top remained at 9G level.

 As the JVM does not free OS memory once allocated, top just shows
 whatever peak it reached at some point. When you tell the JVM that it is
 free to use 25GB, it makes a lot of sense to allocate a fair chunk of
 that instead of garbage collecting if there is a period of high usage
 (mass indexing for example).

  What else could be the artifact of such a difference -- Solr or JVM?
 Can it
  only be explained by the mass indexing? What is worrisome is that the
  4.10.2 shard reserves 8x times it uses.

 If you set your Xmx to a lot less, the JVM will probably favour more
 frequent garbage collections over extra heap allocation.

 - Toke Eskildsen, State and University Library, Denmark





 --
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: Solrcloud Index corruption

2015-03-10 Thread Martin de Vries


Hi,


this _sounds_ like you somehow don't have indexed=true set for the
field in question.


We investigated a lot more. The CheckIndex tool didn't find any error. 
We now think the following happened:
- We changed the schema two months ago: we changed a field to 
indexed=true. We reloaded the cores, but two of them doesn't seem to 
be reloaded (maybe we forgot).

- We reindexed all content. The new field worked fine.
- We think the leader changed to a server that didn't reload the core
- After that we field stopped working for new indexed documents

Thanks for your help.


Martin




Erick Erickson schreef op 06.03.2015 17:02:


bq: You say in our case some docs didn't made it to the node, but
that's not really true: the docs can be found on the corrupted nodes
when I search on ID. The docs are also complete. The problem is that
the docs do not appear when I filter on certain fields

this _sounds_ like you somehow don't have indexed=true set for the
field in question. But it also sounds like you're saying that search
on that field works on some nodes but not on others, I'm assuming
you're adding distrib=false to verify this. It shouldn't be
possible to have different schema.xml files on the different nodes,
but you might try checking through the admin UI.

Network burps shouldn't be related here. If the content is stored,
then the info made it to Solr intact, so this issue shouldn't be
related to that.

Sounds like it may just be the bugs Mark is referencing, sorry I 
don't

have the JIRA numbers right off.

Best,
Erick

On Thu, Mar 5, 2015 at 4:46 PM, Shawn Heisey apa...@elyograg.org 
wrote:



On 3/5/2015 3:13 PM, Martin de Vries wrote:

I understand there is not a master in SolrCloud. In our case we 
use
haproxy as a load balancer for every request. So when indexing 
every

document will be sent to a different solr server, immediately after
each other. Maybe SolrCloud is not able to handle that correctly?

SolrCloud can handle that correctly, but currently sending index
updates to a core that is not the leader of the shard will incur a
significant performance hit, compared to always sending updates to 
the

correct core. A small performance penalty would be understandable,
because the request must be redirected, but what actually happens is 
a
much larger penalty than anyone expected. We have an issue in Jira 
to
investigate that performance issue and make it work as efficiently 
as
possible. Indexing batches of documents is recommended, not sending 
one

document per update request. General performance problems with Solr
itself can lead to extremely odd and unpredictable behavior from
SolrCloud. Most often these kinds of performance problems are 
related
in some way to memory, either the java heap or available memory in 
the
system. http://wiki.apache.org/solr/SolrPerformanceProblems [1] 
Thanks,

Shawn




Links:
--
[1] http://wiki.apache.org/solr/SolrPerformanceProblems

Re: SolrCloud: Chroot error

2015-03-10 Thread Shawn Heisey

On 3/10/2015 6:10 AM, Aman Tandon wrote:
 Thanks Shawn, I tried it with single string but still no success.
 
 So currently i am running it without chroot and it is working fine.

That brings up a something for me or you to try.  I wonder if perhaps
there is a bug that will prevent the directory creation from
happening.  I would imagine that if you create the directory manually,
Solr would work just fine.  My production cloud is running a very old
release - 4.2.1 - and I find it difficult to set up a full SolrCloud
test environment because I don't have a lot of hardware.

Thanks,
Shawn

Re: Solr 5: data_driven_schema_config's solrconfig causing error

2015-03-10 Thread Steve Rowe

Hi Aman,

The stack trace shows that the AddSchemaFieldsUpdateProcessorFactory specified 
in data_driven_schema_configs’s solrconfig.xml expects the “booleans” field 
type to exist.

Solr 5’s data_driven_schema_configs includes the “booleans” field type:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_5_0_0/solr/server/solr/configsets/data_driven_schema_configs/conf/managed-schema?view=markup#l249

So you must have removed it when you modified the schema?  Did you do this 
intentionally?  If so, why?

Steve

 On Mar 10, 2015, at 5:25 AM, Aman Tandon amantandon...@gmail.com wrote:
 
 Hi,
 
 For the sake of using the new schema.xml and solrconfig.xml with solr 5, I
 put my old required field type  fields names (being used with solr 4.8.1)
 in the schema.xml given in *basic_configs*  configurations setting given
 in solrconfig.xml present in *data_driven_schema_configs* and put I put
 these configuration files in the configs of zookeeper.
 
 But when i am creating the core it is giving the error as booleans
 fieldType is not found in schema. So correct me if i am doing something
 wrong.
 
 ERROR - 2015-03-10 08:20:16.788; org.apache.solr.core.CoreContainer; Error
 creating core [core1]: fieldType 'booleans' not found in the schema
 org.apache.solr.common.SolrException: fieldType 'booleans' not found in
 the schema
 at org.apache.solr.core.SolrCore.init(SolrCore.java:896)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:662)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
 at
 org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
 at
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
 at
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.solr.common.SolrException: fieldType 'booleans' not
 found in the schema
 at
 org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$TypeMapping.populateValueClasses(AddSchemaFieldsUpdateProcessorFactory.java:244)
 at
 org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory.inform(AddSchemaFieldsUpdateProcessorFactory.java:170)
 at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:620)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:879)
 ... 35 more
 ERROR - 2015-03-10 08:20:16.825;

RE: Solr phonetics with spelling

2015-03-10 Thread Dyer, James

Ashish,

I would not recommend using spellcheck against a phonetic-analyzed field.  
Instead, you can use copyField to create a separate field that is lightly 
analyzed and use the copy for spelling.  

James Dyer
Ingram Content Group


-Original Message-
From: Ashish Mukherjee [mailto:ashish.mukher...@gmail.com] 
Sent: Tuesday, March 10, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Solr phonetics with spelling

Hello,

Couple of questions related to phonetics -

1. If I enable the phonetic filter in managed-schema file for a particular
field, how does it affect the spell handler?

2. What is the meaning of the inject attribute within analyzer in
managed-schema? The documentation is not very clear about it.

Regards,
Ashish

Re: how to change configurations in solrcloud setup

2015-03-10 Thread Nitin Solanki

Hi Aman,
 You can apply configuration on solr cloud by using this
command -

sudo
path_of_solr/solr_folder_name/example/scripts/cloud-scripts/zkcli.sh
-zkhost localhost:9983 -cmd upconfig -confdir
path_of_solr/solr_folder_name/example/solr/collection1/conf -confname
default

and then restart all nodes of solrcloud.

On Mon, Mar 9, 2015 at 11:43 AM, Aman Tandon amantandon...@gmail.com
wrote:

 Please help.

 With Regards
 Aman Tandon

 On Sat, Mar 7, 2015 at 9:58 PM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  Please tell me what is best way to apply configuration changes in solr
  cloud and how to do that.
 
  Thanks in advance.
 
  With Regards
  Aman Tandon

Re: Chaining components in request handler

2015-03-10 Thread Alexandre Rafalovitch

Ok. Components then. Defined in solrconfig.xml. You can
prepend/append/replace the standard list.

Try that and see if that's enough.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 10 March 2015 at 14:03, Ashish Mukherjee ashish.mukher...@gmail.com wrote:
 Would like to do it during querying.

 Thanks,
 Ashish

 On Tue, Mar 10, 2015 at 11:07 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 Is that during indexing or during query phase?

 Indexing has UpdateRequestProcessors (e.g.
 http://www.solr-start.com/info/update-request-processors/ )
 Query has Components (e.g. Faceting, MoreLIkeThis, etc)

 Or something different?

 Regards,
Alex.
 
 Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
 http://www.solr-start.com/


 On 10 March 2015 at 13:34, Ashish Mukherjee ashish.mukher...@gmail.com
 wrote:
  Hello,
 
  I would like to create a request handler which chains components in a
  particular sequence to return the result, similar to a Unix pipe.
 
  eg. Component 1 - result1 - Component 2 - result2
 
  result2 is final result returned.
 
  Component 1 may be a standard component, Component 2 may be out of the
 box.
 
  Is there any tutorial which describes how to wire together components
 like
  this in a single handler?
 
  Regards,
  Ashish

Re: Solr TCP layer

2015-03-10 Thread Saumitra Srivastav

Thanks everyone for the responses.

My motivation for TCP is coming from a very heavy indexing pipeline where
the smallest of optimization matters. I am working on a machine data parser
which feeds data into Cassandra and Solr and we have SLAs based on how fast
we can make data available in both the sources. We used to have issues with
Cassandra as well but we optimized the s**t out of it.

Now we want to do the same with Solr. While I do realize that this is going
to be a lot of work, but if its something that will reap benefit in long
run, then so be it. Datastax provides a netty based layer in their
enterprise version which folks have reported to be faster. Now just because
a commercial vendor ships it, doesn't mean we will jump into it without
thinking. We will definitely do a effect-vs-effort analysis before
committing to this. 

For majority of users, such high performance might not be a
requirement/priority, so I understand the reluctance to go down this path.

I think it would be best at this time that I start exploring this option and
get back with my analysis.

Thanks again.

Saumitra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715p4192176.html
Sent from the Solr - User mailing list archive at Nabble.com.

Chaining components in request handler

2015-03-10 Thread Ashish Mukherjee

Hello,

I would like to create a request handler which chains components in a
particular sequence to return the result, similar to a Unix pipe.

eg. Component 1 - result1 - Component 2 - result2

result2 is final result returned.

Component 1 may be a standard component, Component 2 may be out of the box.

Is there any tutorial which describes how to wire together components like
this in a single handler?

Regards,
Ashish

Re: Cores and and ranking (search quality)

2015-03-10 Thread Shawn Heisey

On 3/10/2015 11:17 AM, johnmu...@aol.com wrote:
If I have two cores, one core has 10 docs another has 100,000 docs. I then
submit two docs that are 100% identical (with the exception of the unique-ID
fields, which is stored but not indexed) one to each core. The question is,
during search, will both of those docs rank near each other or not? If so,
this is great because it will behave the same as if I had one core and index
both docs to this single core. If not, which core's doc will rank higher and
how far apart the two docs be from each other in the ranking?

Put another way: are docs from the smaller core (the one has 10 docs only)
rank higher or lower compared to docs from the larger core (the one with
100,000) docs?

Without specific knowledge about the document in question as well as all
the other documents, this is impossible to answer, except to say that
the relative ranking position is likely to be different. Dropping back
to general info:

The overall term frequency and inverse document frequency (TF-IDF) in
the 100,000 document index will very likely be quite a lot different
than in the 10 document index. That will affect ranking order.
Sometimes users are surprised by the results they get, but it is very
rare to find a bug in Lucene scoring.

In addition to the debug parameter that Erick told you about, here are a
couple of classes you could investigate at the source code level for
more information about ranking:

http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/Similarity.html
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/DefaultSimilarity.html

Here's info that is more general, and from a much earlier Lucene version:

https://lucene.apache.org/core/3_6_2/scoring.html

I have my Solr install configured to use the BM25 similarity.

http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/BM25Similarity.html
http://en.wikipedia.org/wiki/Okapi_BM25

SOLR-1632 aims to make TF-IDF the same across multiple cores as you
would get if you only had one core. I do not know enough about it to
know whether it is EXACTLY the same, or only an approximation ... but in
a search context, 100 percent precise calculation is rarely required.
When you drop that as a requirement, search becomes easier and a LOT faster.

Thanks,
Shawn

Re: Solr 5.0.0 - Multiple instances sharing Solr server read-only dir

2015-03-10 Thread Damien Dykman

Thanks Timothy for the pointer to the Jira ticket. That's exactly it :-)

Erick, the main reason why I would run multiple instances on the same
machine is to simulate a multi node environment. But beyond that, I like
the idea of being able to clearly separate the server dir and the data
dirs. That way the server dir could be deployed by root. Yet Solr
instances could run in userland.

Damien

On 03/10/2015 09:31 AM, Timothy Potter wrote:
 I think the next step here is to ship Solr with the war already extracted
 so that Jetty doesn't need to extract it on first startup -
 https://issues.apache.org/jira/browse/SOLR-7227

 On Tue, Mar 10, 2015 at 10:15 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 If I'm understanding your problem correctly, I think you want the -d
 option,
 then all the -s guys would be under that.

 Just to check, though, why are you running multiple Solrs? There are
 sometimes
 very good reasons, just checking that you're not making things more
 difficult
 than necessary

 Best,
 Erick

 On Mon, Mar 9, 2015 at 4:59 PM, Damien Dykman damien.dyk...@gmail.com
 wrote:
 Hi all,

 Quoted from

 https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
 When running multiple instances of Solr on the same host, it is more
 common to use the same server directory for each instance and use a
 unique Solr home directory using the -s option.

 Is there a way to achieve this without making *any* changes to the
 extracted content of solr-5.0.0.tgz and only use runtime parameters? I
 other words, make the extracted folder solr-5.0.0 strictly read-only?

 By default, the Solr web app is deployed under server/solr-webapp, as
 per solr-jetty-context.xml. So unless I change solr-jetty-context.xml, I
 cannot make folder sorl-5.0.0 read-only to my Solr instances.

 I've figured out how to make the log files and pid file to be located
 under the Solr data dir by doing:

 export SOLR_PID_DIR=mySolrDataDir/logs; \
 export SOLR_LOGS_DIR=mySolrDataDir/logs; \
 bin/solr start -c -z localhost:32101/solr \
  -s mySolrDataDir \
  -a -Dsolr.log=mySolrDataDir/logs \
  -p 31100 -h localhost

 But if there was a way to not have to change solr-jetty-context.xml that
 would be awesome! Thoughts?

 Thanks,
 Damien

Re: Parsing cluster result's docs

2015-03-10 Thread Erick Erickson

You can get some fields back besides ID, see the carrot.title and
carrot.snippet params. I don't know a good way to get the full
underlying documents though.

Best,
Erick

On Mon, Mar 9, 2015 at 9:33 AM, Jorge Luis Lazo jorgeluis1...@gmail.com wrote:
 Hi,

 I have a Solr instance using the clustering component (with the Lingo
 algorithm) working perfectly. However when I get back the cluster results
 only the ID's of these come back with it. What is the easiest way to
 retrieve full documents instead? Should I parse these IDs into a new query
 to Solr, or is there some configuration I am missing to return full docs
 instead of IDs?

 If it matters, I am using Solr 4.10.

 Thanks.

Re: Solrcloud Index corruption

2015-03-10 Thread Erick Erickson

Ahhh, ok. When you reloaded the cores, did you do it core-by-core?
I can see how something could get dropped in that case.

However, if you used the Collections API and two cores mysteriously
failed to reload that would be a bug. Assuming the replicas in question
were up and running at the time you reloaded.

Thanks for letting us know what's going on.
Erick

On Tue, Mar 10, 2015 at 4:34 AM, Martin de Vries
mar...@downnotifier.com wrote:
 Hi,

 this _sounds_ like you somehow don't have indexed=true set for the
 field in question.


 We investigated a lot more. The CheckIndex tool didn't find any error. We
 now think the following happened:
 - We changed the schema two months ago: we changed a field to
 indexed=true. We reloaded the cores, but two of them doesn't seem to be
 reloaded (maybe we forgot).
 - We reindexed all content. The new field worked fine.
 - We think the leader changed to a server that didn't reload the core
 - After that we field stopped working for new indexed documents

 Thanks for your help.


 Martin




 Erick Erickson schreef op 06.03.2015 17:02:

 bq: You say in our case some docs didn't made it to the node, but
 that's not really true: the docs can be found on the corrupted nodes
 when I search on ID. The docs are also complete. The problem is that
 the docs do not appear when I filter on certain fields

 this _sounds_ like you somehow don't have indexed=true set for the
 field in question. But it also sounds like you're saying that search
 on that field works on some nodes but not on others, I'm assuming
 you're adding distrib=false to verify this. It shouldn't be
 possible to have different schema.xml files on the different nodes,
 but you might try checking through the admin UI.

 Network burps shouldn't be related here. If the content is stored,
 then the info made it to Solr intact, so this issue shouldn't be
 related to that.

 Sounds like it may just be the bugs Mark is referencing, sorry I don't
 have the JIRA numbers right off.

 Best,
 Erick

 On Thu, Mar 5, 2015 at 4:46 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/5/2015 3:13 PM, Martin de Vries wrote:

 I understand there is not a master in SolrCloud. In our case we use
 haproxy as a load balancer for every request. So when indexing every
 document will be sent to a different solr server, immediately after
 each other. Maybe SolrCloud is not able to handle that correctly?

 SolrCloud can handle that correctly, but currently sending index
 updates to a core that is not the leader of the shard will incur a
 significant performance hit, compared to always sending updates to the
 correct core. A small performance penalty would be understandable,
 because the request must be redirected, but what actually happens is a
 much larger penalty than anyone expected. We have an issue in Jira to
 investigate that performance issue and make it work as efficiently as
 possible. Indexing batches of documents is recommended, not sending one
 document per update request. General performance problems with Solr
 itself can lead to extremely odd and unpredictable behavior from
 SolrCloud. Most often these kinds of performance problems are related
 in some way to memory, either the java heap or available memory in the
 system. http://wiki.apache.org/solr/SolrPerformanceProblems [1] Thanks,
 Shawn




 Links:
 --
 [1] http://wiki.apache.org/solr/SolrPerformanceProblems

Re: Cores and and ranking (search quality)

2015-03-10 Thread johnmunir

Thanks Erick for trying to help, I really appreciate it.  Unfortunately, I'm 
still stuck.

There are times one must know the inner working and behavior of the software to 
make design decision and this one is one of them.  If I know the inner working 
of Solr, I would not be asking.  In addition, I'm in the design process, so I'm 
not able to fully test.  Beside my test could be invalid because I may not set 
it up right due to my lack of understanding the inner working of Solr.

Given this, I hope you don't mind me asking again.

If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
submit two docs that are 100% identical (with the exception of the unique-ID 
fields, which is stored but not indexed) one to each core.  The question is, 
during search, will both of those docs rank near each other or not?  If so, 
this is great because it will behave the same as if I had one core and index 
both docs to this single core.  If not, which core's doc will rank higher and 
how far apart the two docs be from each other in the ranking?

Put another way: are docs from the smaller core (the one has 10 docs only) rank 
higher or lower compared to docs from the larger core (the one with 100,000) 
docs?

Thanks!

-- MJ

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, March 10, 2015 11:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Cores and and ranking (search quality)

SOLR-1632 will certainly help. But trying to predict whether your core A or 
core B will appear first doesn't really seem like a good use of time. If you 
actually have a setup like you describe, add debug=all to your query on both 
cores and you'll see all the gory detail of how the scores are calculated, 
providing a definitive answer in _your_ situation.

Best,
Erick

On Mon, Mar 9, 2015 at 5:44 AM,  johnmu...@aol.com wrote:
 (reposing this to see if anyone can help)


 Help me understand this better (regarding ranking).

 If I have two docs that are 100% identical with the exception of uid (which 
 is stored but not indexed).  In a single core setup, if I search xyz such 
 that those 2 docs end up ranking as #1 and #2.  When I switch over to two 
 core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to 
 core-B (which has 100,000 records).

 Now, are you saying in 2 core setup if I search on xyz (just like in singe 
 core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking? 
  That is, are you saying doc-A may now be somewhere at the top / bottom far 
 away from doc-B?  If so, which will be #1: the doc off core-A (that has 10 
 records) or doc-B off core-B (that has 100,000 records)?

 If I got all this right, are you saying SOLR-1632 will fix this issue such 
 that the end result will now be as if I had 1 core?

 - MJ


 -Original Message-
 From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
 Sent: Thursday, March 5, 2015 9:06 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Cores and and ranking (search quality)

 On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote:
 My question is this: if I put my data in multiple cores and use 
 distributed search will the ranking be different if I had all my data 
 in a single core?

 Yes, it will be different. The practical impact depends on how homogeneous 
 your data are across the shards and how large your shards are. If you have 
 small and dissimilar shards, your ranking will suffer a lot.

 Work is being done to remedy this:
 https://issues.apache.org/jira/browse/SOLR-1632

 Also, will facet and more-like-this quality / result be the same?

 It is not formally guaranteed, but for most practical purposes, faceting on 
 multi-shards will give you the same results as single-shards.

 I don't know about more-like-this. My guess is that it will be affected in 
 the same way that standard searches are.

 Also, reading the distributed search wiki
 (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr 
 does the search and result merging (all I have to do is issue a 
 search), is this correct?

 Yes. From a user-perspective, searches are no different.

 - Toke Eskildsen, State and University Library, Denmark

Re: Chaining components in request handler

2015-03-10 Thread Alexandre Rafalovitch

Is that during indexing or during query phase?

Indexing has UpdateRequestProcessors (e.g.
http://www.solr-start.com/info/update-request-processors/ )
Query has Components (e.g. Faceting, MoreLIkeThis, etc)

Or something different?

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 10 March 2015 at 13:34, Ashish Mukherjee ashish.mukher...@gmail.com wrote:
 Hello,

 I would like to create a request handler which chains components in a
 particular sequence to return the result, similar to a Unix pipe.

 eg. Component 1 - result1 - Component 2 - result2

 result2 is final result returned.

 Component 1 may be a standard component, Component 2 may be out of the box.

 Is there any tutorial which describes how to wire together components like
 this in a single handler?

 Regards,
 Ashish

Re: Field Rename in SOLR

2015-03-10 Thread Erick Erickson

What do you mean rename field? It _looks_ like you're trying to get
the results into a doc from your document and changing it's name _in
the results_.
I.e. you have ProductName in your document, but want to see
Name_en-US in your output.

My guess is that the hyphen is the problem. Does it work if you try to
bet Name_en_US? Generally, hyphens are a bad idea with field names.

Best,
Erick

On Mon, Mar 9, 2015 at 2:38 PM, EXTERNAL Taminidi Ravi (ETI,
AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote:
 Hello, Can anyone know how to rename a field with the below field Name. When 
 I try the below method it says undefined field Name_en

 fl=ProductName:Name_en-US

 It throws error saying undefined field 'Name_en', it is not recognizing the 
 full field name.. 'Name_en-US'

 Is there any work around..?

 Thanks

 Ravi

Solr 5 upgrade

2015-03-10 Thread richardg

Ubuntu 14.04.02
Trying to install solr 5 following this:
https://cwiki.apache.org/confluence/display/solr/Upgrading+a+Solr+4.x+Cluster+to+Solr+5.0

I keep getting this script requires extracting a war file with either the
jar or unzip utility, please install these utilities or contact your
administrator for assistance. after running install_solr_service.sh.  It
says Service solr installed. but when I try to run the service I get the
above error.  Not sure the resolution.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-upgrade-tp4192127.html
Sent from the Solr - User mailing list archive at Nabble.com.

Num docs, block join, and dupes?

2015-03-10 Thread Timothy Potter

Before I open a JIRA, I wanted to put this out to solicit feedback on what
I'm seeing and what Solr should be doing. So I've indexed the following 8
docs into a 2-shard collection (Solr 4.8'ish - internal custom branch
roughly based on 4.8) ... notice that the 3 grand-children of 2-1 have
dup'd keys:

[
  {
id:1,
name:parent,
_childDocuments_:[
  {
id:1-1,
name:child
  },
  {
id:1-2,
name:child
  }
]
  },
  {
id:2,
name:parent,
_childDocuments_:[
  {
id:2-1,
name:child,
_childDocuments_:[
  {
id:2-1-1,
name:grandchild
  },
  {
id:2-1-1,
name:grandchild2
  },
  {
id:2-1-1,
name:grandchild3
  }
]
  }
]
  }
]

When I query this collection, using:

http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*wt=jsonindent=trueshards.info=truerows=10

I get:

{
  responseHeader:{
status:0,
QTime:9,
params:{
  indent:true,
  q:*:*,
  shards.info:true,
  wt:json,
  rows:10}},
  shards.info:{

http://localhost:8984/solr/blockjoin2_shard1_replica1/|http://localhost:8985/solr/blockjoin2_shard1_replica2/:{
  numFound:3,
  maxScore:1.0,
  shardAddress:http://localhost:8984/solr/blockjoin2_shard1_replica1;,
  time:4},

http://localhost:8984/solr/blockjoin2_shard2_replica1/|http://localhost:8985/solr/blockjoin2_shard2_replica2/:{
  numFound:5,
  maxScore:1.0,
  shardAddress:http://localhost:8985/solr/blockjoin2_shard2_replica2;,
  time:4}},
  response:{numFound:6,start:0,maxScore:1.0,docs:[
  {
id:1-1,
name:child},
  {
id:1-2,
name:child},
  {
id:1,
name:parent,
_version_:1495272401329455104},
  {
id:2-1-1,
name:grandchild},
  {
id:2-1,
name:child},
  {
id:2,
name:parent,
_version_:1495272401361960960}]
  }}


So Solr has de-duped the results.

If I execute this query against the shard that has the dupes (distrib=false):

http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*wt=jsonindent=trueshards.info=truerows=10distrib=false

Then the dupes are returned:

{
  responseHeader:{
status:0,
QTime:0,
params:{
  indent:true,
  q:*:*,
  shards.info:true,
  distrib:false,
  wt:json,
  rows:10}},
  response:{numFound:5,start:0,docs:[
  {
id:2-1-1,
name:grandchild},
  {
id:2-1-1,
name:grandchild2},
  {
id:2-1-1,
name:grandchild3},
  {
id:2-1,
name:child},
  {
id:2,
name:parent,
_version_:1495272401361960960}]
  }}

So I guess my question is why doesn't the non-distrib query do
de-duping? Mainly confirming this is how it's supposed to work and
this behavior doesn't strike anyone else as odd ;-)

Cheers,

Tim

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-03-10 Thread Erick Erickson

Thanks for letting us know!

Erick

On Tue, Mar 10, 2015 at 5:20 AM, Dmitry Kan solrexp...@gmail.com wrote:
 For the sake of the story completeness, just wanted to confirm these params
 made a positive affect:

 -Dsolr.solr.home=cores -Xmx12000m -Djava.awt.headless=true -XX:+UseParNewGC
 -XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC
 -XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40

 This freed up couple dozen GBs on the solr server!

 On Tue, Feb 17, 2015 at 1:47 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Thanks Toke!

 Now I consistently see the saw-tooth pattern on two shards with new GC
 parameters, next I will try your suggestion.

 The current params are:

 -Xmx25600m -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent
 -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=8
 -XX:CMSInitiatingOccupancyFraction=40

 Dmitry

 On Tue, Feb 17, 2015 at 1:34 PM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:

 On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote:
  Solr: 4.10.2 (high load, mass indexing)
  Java: 1.7.0_76 (Oracle)
  -Xmx25600m
 
 
  Solr: 4.3.1 (normal load, no mass indexing)
  Java: 1.7.0_11 (Oracle)
  -Xmx25600m
 
  The RAM consumption remained the same after the load has stopped on the
  4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
  jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM
 as
  seen by top remained at 9G level.

 As the JVM does not free OS memory once allocated, top just shows
 whatever peak it reached at some point. When you tell the JVM that it is
 free to use 25GB, it makes a lot of sense to allocate a fair chunk of
 that instead of garbage collecting if there is a period of high usage
 (mass indexing for example).

  What else could be the artifact of such a difference -- Solr or JVM?
 Can it
  only be explained by the mass indexing? What is worrisome is that the
  4.10.2 shard reserves 8x times it uses.

 If you set your Xmx to a lot less, the JVM will probably favour more
 frequent garbage collections over extra heap allocation.

 - Toke Eskildsen, State and University Library, Denmark





 --
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info




 --
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info

Re: Solr 5.0.0 - Multiple instances sharing Solr server read-only dir

2015-03-10 Thread Timothy Potter

I think the next step here is to ship Solr with the war already extracted
so that Jetty doesn't need to extract it on first startup -
https://issues.apache.org/jira/browse/SOLR-7227

On Tue, Mar 10, 2015 at 10:15 AM, Erick Erickson erickerick...@gmail.com
wrote:

 If I'm understanding your problem correctly, I think you want the -d
 option,
 then all the -s guys would be under that.

 Just to check, though, why are you running multiple Solrs? There are
 sometimes
 very good reasons, just checking that you're not making things more
 difficult
 than necessary

 Best,
 Erick

 On Mon, Mar 9, 2015 at 4:59 PM, Damien Dykman damien.dyk...@gmail.com
 wrote:
  Hi all,
 
  Quoted from
 
 https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
 
  When running multiple instances of Solr on the same host, it is more
  common to use the same server directory for each instance and use a
  unique Solr home directory using the -s option.
 
  Is there a way to achieve this without making *any* changes to the
  extracted content of solr-5.0.0.tgz and only use runtime parameters? I
  other words, make the extracted folder solr-5.0.0 strictly read-only?
 
  By default, the Solr web app is deployed under server/solr-webapp, as
  per solr-jetty-context.xml. So unless I change solr-jetty-context.xml, I
  cannot make folder sorl-5.0.0 read-only to my Solr instances.
 
  I've figured out how to make the log files and pid file to be located
  under the Solr data dir by doing:
 
  export SOLR_PID_DIR=mySolrDataDir/logs; \
  export SOLR_LOGS_DIR=mySolrDataDir/logs; \
  bin/solr start -c -z localhost:32101/solr \
   -s mySolrDataDir \
   -a -Dsolr.log=mySolrDataDir/logs \
   -p 31100 -h localhost
 
  But if there was a way to not have to change solr-jetty-context.xml that
  would be awesome! Thoughts?
 
  Thanks,
  Damien

Re: Solr TCP layer

2015-03-10 Thread Erick Erickson

Just to pile on:

I admire your bravery! I'll add to the other comments only by saying
that _before_ you start down this path, you really need to articulate
the benefit/cost analysis. to gain a little more communications
efficiency will be a pretty hard sell due to the reasons Shawn
outlined. This is hugely risky and would require a lot of work for
as-yet-unarticulated benefits.

There are lots and lots of other things to work on of significantly
greater impact IMO. How would you like to work on something to help
manage Solr's memory usage for instance ;)?

Best,
Erick

On Mon, Mar 9, 2015 at 9:24 AM, Reitzel, Charles
charles.reit...@tiaa-cref.org wrote:
 A couple thoughts:
 0. Interesting topic.
 1. But perhaps better suited to the dev list.
 2. Given the existing architecture, shouldn't we be looking to transport 
 projects, e.g. Jetty, Apache HttpComponents, for support of new socket or 
 even HTTP layer protocols?
 3. To the extent such support exists, then integration work is still needed 
 at the solr level.  Shalin, is this your intention?

 Also, for those of us not tracking protocol standards in detail, can you 
 describe the benefits to Solr users of http/2?

 Do you expect HTTP/2 to be transparent at the application layer?

 -Original Message-
 From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
 Sent: Monday, March 09, 2015 6:23 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr TCP layer

 Hi Saumitra,

 I've been thinking of adding http/2 support for inter node communication 
 initially and client server communication next in Solr. There's a patch for 
 SPDY support but now that spdy is deprecated and http/2 is the new standard 
 we need to wait for Jetty 9.3 to release. That will take care of many 
 bottlenecks in solrcloud communication. The current trunk is already using 
 jetty 9.2.x which has support for the draft http/2 spec.

 A brand new async TCP layer based on netty can be considered but that's a 
 huge amount of work considering our need to still support simple http, SSL 
 etc. Frankly for me that effort is better spent optimizing the routing layer.
 On 09-Mar-2015 1:37 am, Saumitra Srivastav saumitra.srivast...@gmail.com
 wrote:

 Dear Solr Contributors,

 I want to start working on adding a TCP layer for client to node and
 inter-node communication.

 I am not up to date on recent changes happening to Solr. So before I
 start looking into code, I would like to know if there is already some
 work done in this direction, which I can reuse. Are there any know
 challenges/complexities?

 I would appreciate any help to kick start this effort. Also, what
 would be the best way to discuss and get feedback on design from
 contributors? Open a JIRA??

 Regards,
 Saumitra





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 *
 This e-mail may contain confidential or privileged information.
 If you are not the intended recipient, please notify the sender immediately 
 and then delete it.

 TIAA-CREF
 *

Re: Solr 5.0.0 - Multiple instances sharing Solr server read-only dir

2015-03-10 Thread Erick Erickson

If I'm understanding your problem correctly, I think you want the -d option,
then all the -s guys would be under that.

Just to check, though, why are you running multiple Solrs? There are sometimes
very good reasons, just checking that you're not making things more difficult
than necessary

Best,
Erick

On Mon, Mar 9, 2015 at 4:59 PM, Damien Dykman damien.dyk...@gmail.com wrote:
 Hi all,

 Quoted from
 https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference

 When running multiple instances of Solr on the same host, it is more
 common to use the same server directory for each instance and use a
 unique Solr home directory using the -s option.

 Is there a way to achieve this without making *any* changes to the
 extracted content of solr-5.0.0.tgz and only use runtime parameters? I
 other words, make the extracted folder solr-5.0.0 strictly read-only?

 By default, the Solr web app is deployed under server/solr-webapp, as
 per solr-jetty-context.xml. So unless I change solr-jetty-context.xml, I
 cannot make folder sorl-5.0.0 read-only to my Solr instances.

 I've figured out how to make the log files and pid file to be located
 under the Solr data dir by doing:

 export SOLR_PID_DIR=mySolrDataDir/logs; \
 export SOLR_LOGS_DIR=mySolrDataDir/logs; \
 bin/solr start -c -z localhost:32101/solr \
  -s mySolrDataDir \
  -a -Dsolr.log=mySolrDataDir/logs \
  -p 31100 -h localhost

 But if there was a way to not have to change solr-jetty-context.xml that
 would be awesome! Thoughts?

 Thanks,
 Damien

Re: Cores and and ranking (search quality)

2015-03-10 Thread johnmunir

Thanks Walter.

The design decision I'm trying to solve is this: using multiple cores, will my 
ranking be impacted vs. using single core?

I have records to index and each record can be grouped into object-types, such 
as object-A, object-B, object-C, etc.  I have a total of 30 (maybe more) 
object-types.  There may be only 10 records of object-A, but 10 million records 
of object-B or 1 million of object-C, etc.  I need to be able to search against 
a single object-type and / or across all object-types.

From my past experience, in a single core setup, if I have two identical 
records, and I search on the term  XYZ that matches one of the records, the 
second record ranks right next to the other (because it too contains XYZ).  
This is good and is the expected behavior.  If I want to limit my search to an 
object-type, I AND XYZ with that object-type.  So all is well.

What I'm considering to do for my new design is use multi-cores and distributed 
search.  I am considering to create a core for each object-type: core-A will 
hold records from object-A, core-B will hold records from object-B, etc.  
Before I can make a decision on this design, I need to know how ranking will be 
impacted.

Going back to my earlier example: if I have 2 identical records, one of them 
went to core-A which has 10 records, and the other went to core-B which has 10 
million records, using distributed search, if I now search across all cores on 
the term  XYZ (just like in the single core case), it will match both of 
those records all right, but will those two records be ranked next to each 
other just like in the single core case?  If not, which will rank higher, the 
one from core-A or the one from core-B?

My concern is, using multi-cores and distributed search means I will give up on 
rank quality when records are not distributed across cores evenly.  If so, than 
maybe this is not a design I can use.

- MJ

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Tuesday, March 10, 2015 2:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Cores and and ranking (search quality)

On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote:

 If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
 submit two docs that are 100% identical (with the exception of the unique-ID 
 fields, which is stored but not indexed) one to each core.  The question is, 
 during search, will both of those docs rank near each other or not? […]
 
 Put another way: are docs from the smaller core (the one has 10 docs only) 
 rank higher or lower compared to docs from the larger core (the one with 
 100,000) docs?

These are not quite the same question.

tf.idf ranking depends on the other documents in the collection (the idf term). 
With 10 docs, the document frequency statistics are effectively random noise, 
so the ranking is unpredictable.

Identical documents should rank identically, but whether they are higher or 
lower in the two cores depends on the rest of the docs.

idf statistics don’t settle down until at least 10K docs. You still sometimes 
see anomalies under a million documents. 

What design decision do you need to make? We can probably answer that for you.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Re: Invalid Date String:'1992-07-10T17'

2015-03-10 Thread Chris Hostetter


: is a syntactically significant character to the query parser, so it's 
getting confused by it in the text of your query.

you're seeing the same problem as if you tried to search for foo:bar in 
the yak field using q=yak:foo:bar

you either need to backslash escape the : characters, or wrap the date 
in quotes, or use a diff parser that doesn't treat colons as special 
characters (but remember that since you are building this up as a java 
string, you have to deal with *java* string escaping as well...

   String a = speechDate:1992-07-10T17\\:33\\:18Z;
   String a = speechDate:\1992-07-10T17:33:18Z\;
   String a = speechDate: + 
ClientUtils.escapeQueryChars(1992-07-10T17:33:18Z);
   String a = {!field f=speechDate}1992-07-10T17:33:18Z;

: My goal is to group these speeches (hopefully using date math syntax). I would

Unless you are truely seraching for only documents that have an *exact* 
date value matching your input (down to the millisecond) then seraching or 
a single date value is almost certainly not what you want -- you most 
likely want to do a range search...

  String a = speechDate:[1992-07-10T00:00:00Z TO 1992-07-11T00:00:00Z];

(which doesn't require special escaping, because the query parser is smart 
enough to know that : aren't special inside of the [..])

: like to know if you suggest me to use date or tdate or other because I have
: not understood the difference.

the difference between date and tdate has to do with how you wnat to trade 
index size (on disk  in ram) with search speed for range queries like 
these -- tdate takes up a little more room in the index, but came make 
range queries faster.


-Hoss
http://www.lucidworks.com/

Re: Cores and and ranking (search quality)

2015-03-10 Thread Walter Underwood

If the documents are distributed randomly across shards/cores, then the 
statistics will be similar in each core and the results will be similar.

If the documents are distributed semantically (say, by topic or type), the 
statistics of each core will be skewed towards that set of documents and the 
results could be quite different.

Assume I have tech support documents and I put all the LaserJet docs in one 
core. That term is very common in that core (poor idf) and rare in other cores 
(strong idf). But for the query “laserjet”, all the good answers are in the 
LaserJet-specific core, where they will be scored low.

An identical document that mentions “LaserJet” once will score fairly low in 
the LaserJet-specific collection and fairly high in the other collection.

Global IDF fixes this, by using corpus-wide statistics. That’s how we ran 
Infoseek and Ultraseek in the late 1990’s.

Random allocation to cores avoids it.

If you have significant traffic directed to one object type AND you need peak 
performance, you may want to segregate your cores by object type. Otherwise, 
I’d let SolrCloud spread them around randomly and filter based on an object 
type field. That should work well for most purposes.

Any core with less than 1000 records is likely to give somewhat mysterious 
results. A word that is common in English, like “next”, will only be in one 
document and will score too high. A less-common word, like “unreasonably”, will 
be in 20 and will score low. You need lots of docs for the language statistics 
to even out.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Mar 10, 2015, at 1:23 PM, johnmu...@aol.com wrote:

 Thanks Walter.
 
 The design decision I'm trying to solve is this: using multiple cores, will 
 my ranking be impacted vs. using single core?
 
 I have records to index and each record can be grouped into object-types, 
 such as object-A, object-B, object-C, etc.  I have a total of 30 (maybe more) 
 object-types.  There may be only 10 records of object-A, but 10 million 
 records of object-B or 1 million of object-C, etc.  I need to be able to 
 search against a single object-type and / or across all object-types.
 
 From my past experience, in a single core setup, if I have two identical 
 records, and I search on the term  XYZ that matches one of the records, the 
 second record ranks right next to the other (because it too contains XYZ).  
 This is good and is the expected behavior.  If I want to limit my search to 
 an object-type, I AND XYZ with that object-type.  So all is well.
 
 What I'm considering to do for my new design is use multi-cores and 
 distributed search.  I am considering to create a core for each object-type: 
 core-A will hold records from object-A, core-B will hold records from 
 object-B, etc.  Before I can make a decision on this design, I need to know 
 how ranking will be impacted.
 
 Going back to my earlier example: if I have 2 identical records, one of them 
 went to core-A which has 10 records, and the other went to core-B which has 
 10 million records, using distributed search, if I now search across all 
 cores on the term  XYZ (just like in the single core case), it will match 
 both of those records all right, but will those two records be ranked next to 
 each other just like in the single core case?  If not, which will rank 
 higher, the one from core-A or the one from core-B?
 
 My concern is, using multi-cores and distributed search means I will give up 
 on rank quality when records are not distributed across cores evenly.  If so, 
 than maybe this is not a design I can use.
 
 - MJ
 
 -Original Message-
 From: Walter Underwood [mailto:wun...@wunderwood.org] 
 Sent: Tuesday, March 10, 2015 2:39 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Cores and and ranking (search quality)
 
 On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote:
 
 If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
 submit two docs that are 100% identical (with the exception of the unique-ID 
 fields, which is stored but not indexed) one to each core.  The question is, 
 during search, will both of those docs rank near each other or not? […]
 
 Put another way: are docs from the smaller core (the one has 10 docs only) 
 rank higher or lower compared to docs from the larger core (the one with 
 100,000) docs?
 
 These are not quite the same question.
 
 tf.idf ranking depends on the other documents in the collection (the idf 
 term). With 10 docs, the document frequency statistics are effectively random 
 noise, so the ranking is unpredictable.
 
 Identical documents should rank identically, but whether they are higher or 
 lower in the two cores depends on the rest of the docs.
 
 idf statistics don’t settle down until at least 10K docs. You still sometimes 
 see anomalies under a million documents. 
 
 What design decision do you need to make? We can probably answer that for you.

Re: Num docs, block join, and dupes?

2015-03-10 Thread Jessica Mallet

We've seen this as well. Before we understood the cause, it seemed very
bizarre that hitting different nodes would yield different numFound, as
well as using different rows=N (since the proxying node only de-dupe the
documents that are returned in the response).

I think consistency and correctness should be clearly delineated. Of
course we'd rather have consistently correct result, but failing that, I'd
rather have consistently incorrect result rather than inconsistent results
because otherwise it's even hard to debug, as was the case here.

I think either the node hosting the shard should also do the de-duping, or
no one should. It's strange that the proxying node decides to do some
sketchy limited result set de-dupe.

On Tue, Mar 10, 2015 at 9:09 AM, Timothy Potter thelabd...@gmail.com
wrote:

 Before I open a JIRA, I wanted to put this out to solicit feedback on what
 I'm seeing and what Solr should be doing. So I've indexed the following 8
 docs into a 2-shard collection (Solr 4.8'ish - internal custom branch
 roughly based on 4.8) ... notice that the 3 grand-children of 2-1 have
 dup'd keys:

 [
   {
 id:1,
 name:parent,
 _childDocuments_:[
   {
 id:1-1,
 name:child
   },
   {
 id:1-2,
 name:child
   }
 ]
   },
   {
 id:2,
 name:parent,
 _childDocuments_:[
   {
 id:2-1,
 name:child,
 _childDocuments_:[
   {
 id:2-1-1,
 name:grandchild
   },
   {
 id:2-1-1,
 name:grandchild2
   },
   {
 id:2-1-1,
 name:grandchild3
   }
 ]
   }
 ]
   }
 ]

 When I query this collection, using:


http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*wt=jsonindent=trueshards.info=truerows=10

 I get:

 {
   responseHeader:{
 status:0,
 QTime:9,
 params:{
   indent:true,
   q:*:*,
   shards.info:true,
   wt:json,
   rows:10}},
   shards.info:{
 
http://localhost:8984/solr/blockjoin2_shard1_replica1/|http://localhost:8985/solr/blockjoin2_shard1_replica2/
:{
   numFound:3,
   maxScore:1.0,
   shardAddress:
http://localhost:8984/solr/blockjoin2_shard1_replica1;,
   time:4},
 
http://localhost:8984/solr/blockjoin2_shard2_replica1/|http://localhost:8985/solr/blockjoin2_shard2_replica2/
:{
   numFound:5,
   maxScore:1.0,
   shardAddress:
http://localhost:8985/solr/blockjoin2_shard2_replica2;,
   time:4}},
   response:{numFound:6,start:0,maxScore:1.0,docs:[
   {
 id:1-1,
 name:child},
   {
 id:1-2,
 name:child},
   {
 id:1,
 name:parent,
 _version_:1495272401329455104},
   {
 id:2-1-1,
 name:grandchild},
   {
 id:2-1,
 name:child},
   {
 id:2,
 name:parent,
 _version_:1495272401361960960}]
   }}


 So Solr has de-duped the results.

 If I execute this query against the shard that has the dupes
(distrib=false):


http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*wt=jsonindent=trueshards.info=truerows=10distrib=false

 Then the dupes are returned:

 {
   responseHeader:{
 status:0,
 QTime:0,
 params:{
   indent:true,
   q:*:*,
   shards.info:true,
   distrib:false,
   wt:json,
   rows:10}},
   response:{numFound:5,start:0,docs:[
   {
 id:2-1-1,
 name:grandchild},
   {
 id:2-1-1,
 name:grandchild2},
   {
 id:2-1-1,
 name:grandchild3},
   {
 id:2-1,
 name:child},
   {
 id:2,
 name:parent,
 _version_:1495272401361960960}]
   }}

 So I guess my question is why doesn't the non-distrib query do
 de-duping? Mainly confirming this is how it's supposed to work and
 this behavior doesn't strike anyone else as odd ;-)

 Cheers,

 Tim

Re: Solr TCP layer

2015-03-10 Thread Walter Underwood

I would strongly recommend taking a look at HTTP/2. It might not be fast enough 
for you, but it is fast enough for Google and there are already implementations.

http://http2.github.io/faq/

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Mar 10, 2015, at 11:18 AM, Erick Erickson erickerick...@gmail.com wrote:

 Saumitra:
 
 We certainly don't mean to be overly discouraging, so have at it!
 There has been some talk of using Netty in the future as we pull the
 war-file distribution out of the distro. Now, I have no technical clue
 about the merits .vs. TCP. But that's another possibility you might
 want to put into your analysis.
 
 Best,
 Erick
 
 On Tue, Mar 10, 2015 at 11:13 AM, Saumitra Srivastav
 saumitra.srivast...@gmail.com wrote:
 Thanks everyone for the responses.
 
 My motivation for TCP is coming from a very heavy indexing pipeline where
 the smallest of optimization matters. I am working on a machine data parser
 which feeds data into Cassandra and Solr and we have SLAs based on how fast
 we can make data available in both the sources. We used to have issues with
 Cassandra as well but we optimized the s**t out of it.
 
 Now we want to do the same with Solr. While I do realize that this is going
 to be a lot of work, but if its something that will reap benefit in long
 run, then so be it. Datastax provides a netty based layer in their
 enterprise version which folks have reported to be faster. Now just because
 a commercial vendor ships it, doesn't mean we will jump into it without
 thinking. We will definitely do a effect-vs-effort analysis before
 committing to this.
 
 For majority of users, such high performance might not be a
 requirement/priority, so I understand the reluctance to go down this path.
 
 I think it would be best at this time that I start exploring this option and
 get back with my analysis.
 
 Thanks again.
 
 Saumitra
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715p4192176.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr TCP layer

2015-03-10 Thread Erick Erickson

Saumitra:

We certainly don't mean to be overly discouraging, so have at it!
There has been some talk of using Netty in the future as we pull the
war-file distribution out of the distro. Now, I have no technical clue
about the merits .vs. TCP. But that's another possibility you might
want to put into your analysis.

Best,
Erick

On Tue, Mar 10, 2015 at 11:13 AM, Saumitra Srivastav
saumitra.srivast...@gmail.com wrote:
 Thanks everyone for the responses.

 My motivation for TCP is coming from a very heavy indexing pipeline where
 the smallest of optimization matters. I am working on a machine data parser
 which feeds data into Cassandra and Solr and we have SLAs based on how fast
 we can make data available in both the sources. We used to have issues with
 Cassandra as well but we optimized the s**t out of it.

 Now we want to do the same with Solr. While I do realize that this is going
 to be a lot of work, but if its something that will reap benefit in long
 run, then so be it. Datastax provides a netty based layer in their
 enterprise version which folks have reported to be faster. Now just because
 a commercial vendor ships it, doesn't mean we will jump into it without
 thinking. We will definitely do a effect-vs-effort analysis before
 committing to this.

 For majority of users, such high performance might not be a
 requirement/priority, so I understand the reluctance to go down this path.

 I think it would be best at this time that I start exploring this option and
 get back with my analysis.

 Thanks again.

 Saumitra



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715p4192176.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Invalid Date String:'1992-07-10T17'

2015-03-10 Thread Mirko Torrisi


Hi all,

I am very new with Solr (and Lucene) and I use the last version of it.
I do not understand why I obtain this:

   Exception in thread main
   org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
   from server at http://localhost:8983/solr/Collection1: Invalid Date
   String:'1992-07-10T17'
at
   
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:558)
at
   
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:214)
at
   
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:210)
at
   
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at
   org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:302)
at Update.main(Update.java:18)


Here the code that creates this error:

SolrQuery query = new SolrQuery();
String a = speechDate:1992-07-10T17:33:18Z;
query.set(fq, a);
//query.setQuery( a );  -- I also tried using this one.



According to 
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates, it 
should be right. I tried with others date, or just |-MM-DD, with no 
success.



My goal is to group these speeches (hopefully using date math syntax). I 
would like to know if you suggest me to use date or tdate or other 
because I have not understood the difference.



Thanks in advance,|

Mirko||

RE: Invalid Date String:'1992-07-10T17'

2015-03-10 Thread Ryan, Michael F. (LNG-DAY)

You'll need to wrap the date in quotes, since it contains a colon:

String a = speechDate:\1992-07-10T17:33:18Z\;

-Michael

-Original Message-
From: Mirko Torrisi [mailto:mirko.torr...@ucdconnect.ie] 
Sent: Tuesday, March 10, 2015 3:34 PM
To: solr-user@lucene.apache.org
Subject: Invalid Date String:'1992-07-10T17'

Hi all,

I am very new with Solr (and Lucene) and I use the last version of it.
I do not understand why I obtain this:

Exception in thread main
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/Collection1: Invalid Date
String:'1992-07-10T17'
 at

org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:558)
 at

org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:214)
 at

org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:210)
 at

org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
 at
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:302)
 at Update.main(Update.java:18)

Here the code that creates this error:

 SolrQuery query = new SolrQuery();
 String a = speechDate:1992-07-10T17:33:18Z;
 query.set(fq, a);
 //query.setQuery( a );  -- I also tried using this one.

According to
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates, it should 
be right. I tried with others date, or just |-MM-DD, with no success.

My goal is to group these speeches (hopefully using date math syntax). I would 
like to know if you suggest me to use date or tdate or other because I have not 
understood the difference.

Thanks in advance,|

Mirko||

Re: Cores and and ranking (search quality)

2015-03-10 Thread Walter Underwood

On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote:

 If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
 submit two docs that are 100% identical (with the exception of the unique-ID 
 fields, which is stored but not indexed) one to each core.  The question is, 
 during search, will both of those docs rank near each other or not? […]
 
 Put another way: are docs from the smaller core (the one has 10 docs only) 
 rank higher or lower compared to docs from the larger core (the one with 
 100,000) docs?

These are not quite the same question.

tf.idf ranking depends on the other documents in the collection (the idf term). 
With 10 docs, the document frequency statistics are effectively random noise, 
so the ranking is unpredictable.

Identical documents should rank identically, but whether they are higher or 
lower in the two cores depends on the rest of the docs.

idf statistics don’t settle down until at least 10K docs. You still sometimes 
see anomalies under a million documents. 

What design decision do you need to make? We can probably answer that for you.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Re: Solr TCP layer

2015-03-10 Thread Shawn Heisey

On 3/10/2015 12:13 PM, Saumitra Srivastav wrote:
 Now we want to do the same with Solr. While I do realize that this is going
 to be a lot of work, but if its something that will reap benefit in long
 run, then so be it. Datastax provides a netty based layer in their
 enterprise version which folks have reported to be faster.

Netty has been discussed as a replacement for the Servlet API, as one
pathway towards Solr becoming a standalone application.  I'm pretty sure
that the general thinking within the project is to keep using HTTP (that
is one of the protocols that Netty implements) but the hope is that it
would be more efficient than a servlet container.  There is a lot of
evidence that Netty implements network communication much more
efficiently than other libraries.

If you have the experience to do work like that, user contributions are
always welcome.

Thanks,
Shawn

Re: Num docs, block join, and dupes?

2015-03-10 Thread Mikhail Khludnev

On Tue, Mar 10, 2015 at 7:09 PM, Timothy Potter thelabd...@gmail.com
wrote:

 So I guess my question is why doesn't the non-distrib query do
 de-duping?


Tim,
that's by design behavior. the special _root_ field is used as a delete
term when a block update is applied i.e in case of block, uniqueKey is
not used. see
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L224
I agree that's one of the issues of the current block update
implementation, but frankly speaking, I didn't consider it as an oddity. Do
you? What do you want to achieve?

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

Import Feed rss delta-import

2015-03-10 Thread Ednardo

Hi,

How do I create a DataImportHandler using delta-import for rss feeds?

Thanks!!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Import-Feed-rss-delta-import-tp4192257.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 5: data_driven_schema_config's solrconfig causing error

2015-03-10 Thread Aman Tandon

Hi,

For the sake of using the new schema.xml and solrconfig.xml with solr 5, I
put my old required field type  fields names (being used with solr 4.8.1)
in the schema.xml given in *basic_configs*  configurations setting given
in solrconfig.xml present in *data_driven_schema_configs* and put I put
these configuration files in the configs of zookeeper.

But when i am creating the core it is giving the error as booleans
fieldType is not found in schema. So correct me if i am doing something
wrong.

ERROR - 2015-03-10 08:20:16.788; org.apache.solr.core.CoreContainer; Error
 creating core [core1]: fieldType 'booleans' not found in the schema
 org.apache.solr.common.SolrException: fieldType 'booleans' not found in
 the schema
 at org.apache.solr.core.SolrCore.init(SolrCore.java:896)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:662)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
 at
 org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
 at
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
 at
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.solr.common.SolrException: fieldType 'booleans' not
 found in the schema
 at
 org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$TypeMapping.populateValueClasses(AddSchemaFieldsUpdateProcessorFactory.java:244)
 at
 org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory.inform(AddSchemaFieldsUpdateProcessorFactory.java:170)
 at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:620)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:879)
 ... 35 more
 ERROR - 2015-03-10 08:20:16.825; org.apache.solr.common.SolrException;
 org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core1':
 Unable to create core [core1] Caused by: fieldType 'booleans' not found in
 the schema
 at
 org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:606)
 at
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
 at
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at

Re: Import Feed rss delta-import

2015-03-10 Thread Alexandre Rafalovitch

I don't think you can since you can't query RSS normally. You just do full
import and override on ids.

Regards,
Alex
On 10 Mar 2015 7:16 pm, Ednardo ednardomart...@gmail.com wrote:

 Hi,

 How do I create a DataImportHandler using delta-import for rss feeds?

 Thanks!!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Import-Feed-rss-delta-import-tp4192257.html
 Sent from the Solr - User mailing list archive at Nabble.com.

38 matches

Mail list logo