solr-cloud documents decouple issue

2013-04-21 Thread qibaoyuan
hello, I have plenty of docs and each docs maybe connected to many user-defined tags.I have used solr-cloud, and use JOIN to do this kind of job,and recently i know solr-cloud does not support distributed search.AND so this is a big problem so far.AND the decouple is quite

custom routing in SolrCloud - shard assignment

2013-04-21 Thread AlexeyK
I'm going to use the implicitdocrouter for sharding. Our sharding is not based on a hashing mechanism. As far as I understand, if I don't provide the numShards parameter, implicit router is used. My question is: Using the implicit routing, how can I assign a new core to a new shard, instead of

Re: solr-cloud problem about user-specified tags

2013-04-21 Thread Erick Erickson
bq: ... i know sole-cloud does not support distributed search.. huh? Or do you mean that solr cloud doesn't support distributed join? You really have to give us a better idea what the problem you're trying to solve is, you might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick

Re: solr-cloud problem about user-specified tags

2013-04-21 Thread qibaoy...@gmail.com
yes,solr-cloud does not support distributed join.any good idea to slove my problem? Erick Erickson erickerick...@gmail.com编写: bq: ... i know sole-cloud does not support distributed search.. huh? Or do you mean that solr cloud doesn't support distributed join? You really have to give us a

Re: Updating clusterstate from the zookeeper

2013-04-21 Thread Erick Erickson
I'm pretty sure there's been some hardening of deleting nodes/collections to deal with nodes in a bad state, I'm pretty sure they're available in 4.3. Not guaranteeing that this would solve your problem, but it's probably worth looking at the CHANGES.txt for 4.3 to see if it's worth exploring

Re: SolrCloud loadbalancing, replication, and failover

2013-04-21 Thread Erick Erickson
One note to add. There's been lots of discussion here about index size, which is a slippery concept. To whit: Look at your index directory, specifically the *.fdt and *.fdx files. That's where the verbatim copy of your data is held, i.e. whenever you specify 'stored=true', and is almost totally

Re: using deletebyid with solrj

2013-04-21 Thread Erick Erickson
Uhhhm HttpSolrServer.deleteById (several varieties)? http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/client/solrj/SolrServer.html#deleteById(java.util.List) the rest of your question is confusing. bq: how can I reindex in the same way one particular row Solr does that

Re: solr-cloud problem about user-specified tags

2013-04-21 Thread Erick Erickson
Well, the usual answer is to flatten your data such that you do not _have_ to do a join in the first place. But other than that kind of super-general answer, your question is too lacking in detail to provide any useful response. Best Erick On Sun, Apr 21, 2013 at 10:04 AM, qibaoy...@gmail.com

Re: using deletebyid with solrj

2013-04-21 Thread Shawn Heisey
On 4/20/2013 11:21 PM, Tania Marinova wrote: so my question is as I have the abillity to index one partricular row (you saw my code) how can I reindex in the same way one particular row assuming that I know the id of that row (so i can select it) so it's no longer indexed in solr. My

Solr cloud and batched updates

2013-04-21 Thread Timothy Potter
There's no problem here, but I'm curious about how batches of updates are handled on the Solr server side in Solr cloud? Going over the code for DistributedUpdateProcessor and SolrCmdDistributor, it appears that the batch is broken down and docs are processed one-by-one. By processed, I mean that

CloudSolrServer and update requests

2013-04-21 Thread Timothy Potter
Today is my day for conceptual questions ;-) From what I understand, CloudSolrServer is smart because it uses cluster state information pulled from Zookeeper to send update requests to leaders instead of replicas. This provides a slight benefit in that the update request will land on the correct

Re: Solr cloud and batched updates

2013-04-21 Thread Erick Erickson
I'm pretty sure there's a JIRA to do just that, it just hasn't been implemented yet. I guess it's one of those things that would undoubtedly be more efficient, but whether it would really be noticeable or not is an open question. At any rate, there are more important fish to fry but if you'd like

Re: CloudSolrServer and update requests

2013-04-21 Thread Erick Erickson
Same reply as your other question I think It's on the drawing board but hasn't percolated up past other urgent issues... Erick On Sun, Apr 21, 2013 at 1:28 PM, Timothy Potter thelabd...@gmail.com wrote: Today is my day for conceptual questions ;-) From what I understand, CloudSolrServer

Re: CloudSolrServer and update requests

2013-04-21 Thread Timothy Potter
Ok, thanks for both responses - agreed on the bigger fish part too, but for this one I wanted to make sure I wasn't overlooking something. Now that I know it's a reasonable approach, I'll give some more thought. Thanks. Tim On Sun, Apr 21, 2013 at 11:59 AM, Erick Erickson erickerick...@gmail.com

Re: Dynamic data model design questions

2013-04-21 Thread Jack Krupansky
1. Relatively small numbers of dynamic fields are fine. 20-40 would be fine. 1,000 would become problematic. hundreds would also likely be problematic, but it would depend on the application and its data. 2. Sparse dynamic fields are less problematic than larger numbers of fully populated

What is Partition Split At DolrCloud?

2013-04-21 Thread Furkan KAMACI
I was reading here: http://wiki.apache.org/solr/NewSolrCloudDesign There says something about: *Split_partition* : (params : partitionoptional). The partition is split into two halves. If the partition parameter is not supplied, the partition with the largest number of documents is identified as

Re: Solr cloud and batched updates

2013-04-21 Thread Yonik Seeley
On Sun, Apr 21, 2013 at 11:57 AM, Timothy Potter thelabd...@gmail.com wrote: There's no problem here, but I'm curious about how batches of updates are handled on the Solr server side in Solr cloud? Going over the code for DistributedUpdateProcessor and SolrCmdDistributor, it appears that the

Where to use replicationFactor and maxShardsPerNode at SolrCloud?

2013-04-21 Thread Furkan KAMACI
I know that: when using SolrCloud we define the number of shards into the system. When we start up new Solr instances each one will be a a leader for a shard, and if I continue to start up new Solr instances (that has exceeded the number number of shards) each one will be a replica for each leader

Re: Solr cloud and batched updates

2013-04-21 Thread Timothy Potter
That's awesome! Thanks Yonik. Tim On Sun, Apr 21, 2013 at 1:30 PM, Yonik Seeley yo...@lucidworks.com wrote: On Sun, Apr 21, 2013 at 11:57 AM, Timothy Potter thelabd...@gmail.com wrote: There's no problem here, but I'm curious about how batches of updates are handled on the Solr server side in

Re: Pros and cons of using RAID or different RAIDS?

2013-04-21 Thread Furkan KAMACI
When I read documentation about Hbase it says RAID is not recommended for many cases. When we talk about SolrCloud (and consider that if a machine goes down there is a failure system via replicas) and when we think about the purposes of different RAID disks: do they true - using RAID systems for:

Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-21 Thread Furkan KAMACI
All in all is there anything that we can say before measuring the performance comparison of storing the stored values of documents at Hbase? I mean as like: * I will need to communicate with Hbase and this will produce more latency than Lucene * I will loose some built-in functionality that

Re: CloudSolrServer and update requests

2013-04-21 Thread Mark Miller
https://issues.apache.org/jira/browse/SOLR-3154 - Mark On Apr 21, 2013, at 1:28 PM, Timothy Potter thelabd...@gmail.com wrote: Today is my day for conceptual questions ;-) From what I understand, CloudSolrServer is smart because it uses cluster state information pulled from Zookeeper to

Re: Pros and cons of using RAID or different RAIDS?

2013-04-21 Thread Shawn Heisey
On 4/21/2013 4:23 PM, Furkan KAMACI wrote: When I read documentation about Hbase it says RAID is not recommended for many cases. When we talk about SolrCloud (and consider that if a machine goes down there is a failure system via replicas) and when we think about the purposes of different RAID

Bug? JSON output changes when switching to solr cloud

2013-04-21 Thread David Parks
We just took an installation of 4.1 which was working fine and changed it to run as solr cloud. We encountered the most incredibly bizarre apparent bug: In the JSON output, a colon ':' changed to a comma ',', which of course broke the JSON parser. I'm guessing I should file this as a bug, but it

Re: is phrase search possible in solr

2013-04-21 Thread vicky desai
Hi, Agreed it is a typo. And yes I can use one set of analyzers and tokenizers for query as well as indexing but that too will not solve my problem -- View this message in context: http://lucene.472066.n3.nabble.com/is-phrase-search-possible-in-solr-tp4057312p4057802.html Sent from the Solr -

Re: is phrase search possible in solr

2013-04-21 Thread vicky desai
Hi Jack, Making a changes in the schema either keyword tokenizer or copy field option which u suggested would require reindexing of entire data. Is there an option wherein if I have a query in double quotes it simply ignores all the tokenizers and analyzers. -- View this message in context:

Re: What is Partition Split At DolrCloud?

2013-04-21 Thread Shalin Shekhar Mangar
The NewSolrCloudDesign document is not accurate. It was initially created to record ideas but the implementation of SolrCloud has evolved to be different from the design in that document. SOLR-3755 splits a given partition and creates two partitions of half the hash range of the parent partition.

Re: is phrase search possible in solr

2013-04-21 Thread qibaoyuan
Shingling filter may be help. I want to do a phrase search in solr without analyzers being applied to it eg - If I search for *DelhiDareDevil* (i.e - with inverted commas)it should search the exact text and not apply any analyzers or tokenizers on this field However if i search for

Re: stats.facet not working for timestamp field

2013-04-21 Thread J Mohamed Zahoor
It is a date field. field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/ ./zahoor On 19-Apr-2013, at 5:02 PM, Erick Erickson erickerick...@gmail.com wrote: I'm guessing that your timestamp is a tdate, which stores extra information in the index for fast

Re: Dynamic data model design questions

2013-04-21 Thread William Bell
You can store JSON in Solr as a string field. For searching you need to pull out into separate fields. To store JSON and use wt=jaon without messing with the field try my patch. Solr-4685 and there is a field patch to take XML and convert to JSON if you need that. [image: Solr] - Solr

Re: is phrase search possible in solr

2013-04-21 Thread vicky desai
Hi, If I use shinglingFilter than all type of queries will be impacted. I want queries within double quotes to be an exact search but for queries without double quotes all analyzers and tokenizers should be applied. Is there a setting or a configuration in schema.xml which can cater this