Simulate "this IndexReader is closed" ?

2020-06-22 Thread Richard Goodman
Hi there,

I've spent time implementing the solr prometheus exporter into our Solr
environment. During this, I did come across an issue where when I was
getting the core level metris, I was getting exceptions.

Digging into this further, I realised it's actually on the Solr side of
this, in particular, the metrics that come from the following group within
each core;

SEARCHER.*


An example of the output I was getting;

{
  "responseHeader":{
"status":500,
"QTime":44},
  "error":{
"msg":"this IndexReader is closed",
"trace":"org.apache.lucene.store.AlreadyClosedException: this
IndexReader is closed\n\tat
org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:257)\n\tat
org.apache.lucene.index.StandardDirectoryReader.getVersion(StandardDirectoryReader.java:339)\n\tat
org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:127)\n\tat
org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:127)\n\tat
org.apache.solr.search.SolrIndexSearcher.lambda$initializeMetrics$13(SolrIndexSearcher.java:2268)


I changed my metric calls to include "=^(?!SEARCHER).*" and the
results were coming through *(minus the SEARCHER metrics)*.

This was enough to unblock me from getting the rest of the metrics,
however, I want to revisit this and see what I can do, as from my point of
view, this is a bug within Solr, because it breaks the entire metrics API *(for
example, if you just hit /solr/admin/metrics and the IndexReader is Closed,
it'll return this message and 0 metrics will be collected/displayed)*.

My problem is, I'm not entirely sure how to replica this error? and was
hoping I could find some guidance. I saw that the file "
org/apache/solr/search/SolrIndexSearcher.java" has the metrics in them, but
got a bit lost from first glance.

If anyone has any information that could help me;
1. Replicate the issue
2. Explain what exactly does it mean when IndexReader is closed

I would be really grateful,

Kind Regards,
Richard Goodman


Re: solr core metrics & prometheus exporter - indexreader is closed

2020-05-11 Thread Richard Goodman
Hey Dwane,

Thanks for your email, gah I should have mentioned that I had applied the
patches from 8.x branches onto the exporter already *(such as the fixed
thread pooling that you mentioned). *I still haven't gotten to the bottom
of the IndexReader is closed issue, I found that if that was present on an
instance, even calling just http://ip.address:port/solr/admin/metrics would
return that and 0 metrics. If I added the following parameter to the
call; =^(?!SEARCHER).*
It was all fine. I'm trying to wrap my head around the relationship between
a solr core, and an index searcher / reader in the code, but it's quite
complicated, similarly, trying to understand how I could replicate this for
testing purposes. So if you have any guidance/advice on that area, would be
greatly appreciated.

Cheers,

On Wed, 6 May 2020 at 21:36, Dwane Hall  wrote:

> Hey Richard,
>
> I noticed this issue with the exporter in the 7.x branch. If you look
> through the release notes for Solr since then there have been quite a few
> improvements to the exporter particularly around thread safety and
> concurrency (and the number of nodes it can monitor).  The version of the
> exporter can run independently to your Solr version so my advice would be
> to download the most recent Solr version, check and modify the exporter
> start script for its library dependencies, extract these files to a
> separate location, and run this version against your 7.x instance. If you
> have the capacity to upgrade your Solr version this will save you having to
> maintain the exporter separately. Since making this change the exporter has
> not missed a beat and we monitor around 100 Solr nodes.
>
> Good luck,
>
> Dwane
> --
> *From:* Richard Goodman 
> *Sent:* Tuesday, 5 May 2020 10:22 PM
> *To:* solr-user@lucene.apache.org 
> *Subject:* solr core metrics & prometheus exporter - indexreader is closed
>
> Hi there,
>
> I've been playing with the prometheus exporter for solr, and have created
> my config and have deployed it, so far, all groups were running fine (node,
> jetty, jvm), however, I'm repeatedly getting an issue with the core group;
>
> WARN  - 2020-05-05 12:01:24.812; org.apache.solr.prometheus.scraper.Async;
> Error occurred during metrics collection
> java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://127.0.0.1:8083/solr: Server Error
>
> request:
> http://127.0.0.1:8083/solr/admin/metrics?group=core=json=2.2
> at
>
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> ~[?:1.8.0_141]
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> ~[?:1.8.0_141]
> at
> org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45)
> ~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
> e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> ~[?:1.8.0_141]
> at
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> ~[?:1.8.0_141]
> at
>
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
> ~[?:1.8.0_141]
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> ~[?:1.8.0_141]
> at
>
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> ~[?:1.8.0_141]
> at
>
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> ~[?:1.8.0_141]
> at
>
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> ~[?:1.8.0_141]
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> ~[?:1.8.0_141]
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> ~[?:1.8.0_141]
> at
>
> org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43)
> ~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
> e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
> at
>
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
> ~[?:1.8.0_141]
> at
>
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
> ~[?:1.8.0_141]
> at
>
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> ~[?:1.8.0_141]
> at
>
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
> ~[?:1.8.0_141]
>

solr core metrics & prometheus exporter - indexreader is closed

2020-05-05 Thread Richard Goodman
a:121)\n\tat
org.apache.solr.handler.admin.MetricsHandler.handleRequestBody(MetricsHandler.java:101)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat
java.lang.Thread.run(Thread.java:748)\n",
"code":500}}


Because of these errors, when I actually then go to the endpoint the
prometheus exporter is running for the core metrics, I get 0 metrics back
because of this, I am yet to further investigate the prometheus exporter
to determine if in the scenario some metrics can not be gathered and throw
an error, if no metrics are recorded at all.

Whilst I was working on SOLR-14325
<https://issues.apache.org/jira/browse/SOLR-14325>, Andrzej noted that the
core level metrics are only reported if there is an open SolrIndexSearcher.
I started looking at the code for this, but wanted to know if anyone else
has encountered this issue before, it seems to be very frequent with this
cluster I am testing (96 instances, each instance having around 450GB of
indexes on disk w/ 3 way replication).

I guess also, would it bring up a question of having a better response
rather than a 500 status error if no metrics are available?

Kind regards,

-- 

Richard Goodman


auotscaling tlog+pull and replicationFactor

2020-04-02 Thread Richard Goodman
Hi there,

I'm currently using solr v7.7.2 *(I applied the patch SOLR-13674
<https://issues.apache.org/jira/browse/SOLR-13674> to my build to preserve
replica types for triggers)* and trying to set up a cluster with the
combination of TLOG+PULL replicas and utilising the solr autoscaling
feature to maintain the stability of the cluster.

I'm getting confused though with the documentation.

When creating a collection, if you specify "replicationFactor=3" then it'll
create 3 NRT replicas, and from my understanding, this information is
preserved within autoscaling, i.e. if a node went down, it would attempt to
add another replica to preserve the replicationFactor.

However, because I'm using TLOG + PULL, I don't add the
"replicationFactor=3" into my collection creation, otherwise it ends up
creating a collection with 3 NRT replicas, 1 TLOG and 2 PULL replicas,
which of course I do not want.

Instead, I add "=1=2" which satisfies. However,
when it comes to me creating policies, I'm not getting violations when
expected.

I created the following policies:
{
  "replica": "<2",
  "shard": "#EACH",
  "type": "TLOG",
  "node": "#ANY"
},
{
  "replica": "<3",
  "shard": "#EACH",
  "type": "PULL",
  "node": "#ANY"
}

And there appears to be 0 violations, however, when I add another replica,
there are still 0 violations despite there now being 4 replicas for a given
shard of a collection;

"violations":[],
"config":{
  "cluster-preferences":[{
  "minimize":"cores",
  "precision":1}
,{
  "maximize":"freedisk"}],
  "cluster-policy":[{
  "replica":"<2",
  "shard":"#EACH",
  "type":"TLOG",
  "node":"#ANY"}
,{
  "replica":"<3",
  "shard":"#EACH",
  "type":"PULL",
  "node":"#ANY"}]}},

And here is the information about a collection-shard in which I added an
extra replica to which I expected there to be a violation:
{
  "replica_p35": {
"core": "collection_four_shard1_replica_p35",
"shard": "shard1",
"collection": "collection_four",
"state": "active",
"shard_name": "collection_four_shard1",
"num_shards": 6,
"type": "PULL"
  },
  "replica_p31": {
"core": "collection_four_shard1_replica_p31",
"shard": "shard1",
"collection": "collection_four",
"state": "active",
"shard_name": "collection_four_shard1",
"num_shards": 6,
"type": "PULL"
  },
  "replica_p87": {
"core": "collection_four_shard1_replica_p87",
"shard": "shard1",
"collection": "collection_four",
"state": "active",
"shard_name": "collection_four_shard1",
"num_shards": 6,
"type": "PULL"
  },
  "replica_t75": {
"core": "collection_four_shard1_replica_t75",
"shard": "shard1",
"collection": "collection_four",
"state": "active",
"shard_name": "collection_four_shard1",
"type": "TLOG",
"leader": "true"
  }
}

Am I missing something to preserve replicationFactor for a collection, and
trying to make collections have 1 TLOG replica and 2 PULL replicas?

I tried adding the following
{"replicas": "<4", "shard":"#EACH", "node": "#ANY"}

However, still no luck

Equally, how would I then go about setting up triggers to only create a
PULL if a PULL goes down, and equally if the TLOG goes down? Would having a
trigger for each type be needed?

Any guidance on this would be greatly appreciated
Thanks,
-- 

Richard Goodman


Use copyField with wildcard in source; how then to work out where a value came from?

2019-10-31 Thread Richard Walker
I've got a collection for which the schema has
a number of copyFields that have a wildcard in the source:

  

The idea is that I have fields in each document
that contain language-specific values in
fields that have field names that end in a language tag,
i.e., "skos_prefLabel-en", "skos_prefLabel-de",
"skos_prefLabel-fr", etc.
Let's say for this example that we have a Solr document
with:
{ ...,
  "skos_prefLabel-en": "One",
  "skos_prefLabel-de": "Eins",
  "skos_prefLabel-fr": "Un",
  ...
}

[ Let's leave aside the issue of what the field
type for "skos_prefLabel_all" should be; let's assume I'm
happy for it to be (say) "text_en_splitting" and
(for now) I'll live with the fact that this is wrong. ]

The idea is to be able to do searching and highlighting
on one or more specific languages, and _also_ to
be able to do a language-independent search, or,
if you like, to search for values in all languages
in one go. I want to display details of matches
and highlighting _with their language information_.

The problem: suppose I get a match and some
highlighting against the field skos_prefLabel_all.
How do I know which field(s) the data _came_ from?

My guess: when using a copyField in this way
(i.e., with a wildcard in the source),
it's not (in general) possible to work backwards from the
destination field to work out which source field
the content came from.

If that is so, one way to get what I want would
seem to be to _not_ use a copyField, but to
construct the Solr documents such that they
already contain a value for skos_prefLabel_all,
let's say, ["One", "Eins", "Un"],
and (let's say) for another field skos_prefLabel_all_languages,
that would then in this case have the value ["en", "de", "fr"],
i.e., such that there's a one-to-one match
between the values of skos_prefLabel_all and the
corresponding values of skos_prefLabel_all_languages.

Now I can display results with corresponding
language tags. Dealing with highlighting data
would still currently seem to be problematic,
but would be possible with something like
David Smiley's work at
https://issues.apache.org/jira/browse/SOLR-1954 .

Surely I'm missing something here.
Is there another/better way?

Richard.



Re: Metrics API - Documentation

2019-10-15 Thread Richard Goodman
Many thanks both for your responses, they've been helpful.

@Andrzej - Sorry I wasn't clear on the "A latency of 1mil" as I wasn't
aware the image wouldn't come through. But following your bullet points
helped me present a better unit for measurement in the axis.

In regards to contributing, would absolutely love to help there, just not
sure what the correct direction is? I wasn't sure if the web page source
code / contributions are in the apache-lucene repository?

Thanks,


On Tue, 8 Oct 2019 at 11:04, Andrzej Białecki  wrote:

> Hi,
>
> Starting with Solr 7.0 all JMX metrics are actually internally driven by
> the metrics API - JMX (or Prometheus) is just a way of exposing them.
>
> I agree that we need more documentation on metrics - contributions are
> welcome :)
>
> Regarding your specific examples (btw. our mailing lists aggressively
> strip all attachments - your graphs didn’t make it):
>
> * time units in time-based counters are in nanoseconds. This is just a
> unit of value, not necessarily precision. In this specific example
> `ADMIN./admin/collections.totalTime` (and similarly named metrics for all
> other request handlers) represents the total elapsed time spent processing
> requests.
> * time-based histograms are expressed in milliseconds, where it is
> indicated by the “_ms” suffix.
> * 1-, 5- and 15-min rates represent an exponentially weighted moving
> average over that time window, expressed in events/second.
> * handlerStart is initialised with System.currentTimeMillis() when this
> instance of request handler is first created.
> * details on GC, memory buffer pools, and similar JVM metrics are
> documented in JDK documentation on Management Beans. For example:
>
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
> <
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
> >
> * "A latency of 1mil” - no idea what that is, I don’t think Solr API uses
> this abbreviation anywhere.
>
> Hope this helps.
>
> —
>
> Andrzej Białecki
>
> > On 7 Oct 2019, at 13:41, Emir Arnautović 
> wrote:
> >
> > Hi Richard,
> > We do not use API to collect metrics but JMX, but I believe that those
> are the same (did not verify it in code). You can see how we handled those
> metrics into reports/charts or even use our agent to send data to
> Prometheus:
> https://github.com/sematext/sematext-agent-integrations/tree/master/solr <
> https://github.com/sematext/sematext-agent-integrations/tree/master/solr>
> >
> > You can also see some links to Solr metric related blog posts in this
> repo. If you find out that managing your own monitoring stack is
> overwhelming, you can try our Solr integration.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 7 Oct 2019, at 12:40, Richard Goodman 
> wrote:
> >>
> >> Hi there,
> >>
> >> I'm currently working on using the prometheus exporter to provide some
> detailed insights for our Solr Cloud clusters.
> >>
> >> Using the provided template killed our prometheus server, as well as
> the exporter due to the size of our clusters (each cluster is around 96
> nodes, ~300 collections with 3way replication and 16 shards), so you can
> imagine the amount of data that comes through /admin/metrics and not
> filtering it down first.
> >>
> >> I've began working on writing my own template to reduce the amount of
> data being requested and it's working fine, and I'm starting to build some
> nice graphs in Grafana.
> >>
> >> The only difficulty I'm having with this, is I'm struggling to find
> decent documentation on the metrics themselves. I was using the resources
> metrics reporting - metrics-api <
> https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api>
> and monitoring solr with prometheus and grafana <
> https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html>
> but there is a lack of information on most metrics.
> >>
> >> For example:
> >> "ADMIN./admin/collections.totalTime":6715327903,
> >> I understand this is a counter, however, I'm not sure what unit this
> would be represented when displaying it, for example:
> >>
> >>
> >>
> >> A latency of 1mil, not sure if this means milliseconds, million, etc.,
> >> Another example would be the GC metrics:
> >>  "gc

Unable to log into Jira

2019-10-15 Thread Richard Goodman
Hey,

Sorry if this is the wrong group, I tried to email us...@infra.apache.org a
few weeks ago but haven't heard anything.

I am unable to log into my account, with it saying my password is
incorrect. But what is more odd is my name on the account has changed from
Richard Goodman to Alex Goodman.

I can send a forgot username which comes through to my registered email,
which is this one. However, if I do a forgot password, the email never
shows up 

Does anyone know which contact to use in order to help me sort this issue
out?

Thanks,

Richard Goodman


Metrics API - Documentation

2019-10-07 Thread Richard Goodman
Hi there,

I'm currently working on using the prometheus exporter to provide some
detailed insights for our Solr Cloud clusters.

Using the provided template killed our prometheus server, as well as the
exporter due to the size of our clusters *(each cluster is around 96 nodes,
~300 collections with 3way replication and 16 shards)*, so you can imagine
the amount of data that comes through /admin/metrics and not filtering it
down first.

I've began working on writing my own template to reduce the amount of data
being requested and it's working fine, and I'm starting to build some nice
graphs in Grafana.

The only difficulty I'm having with this, is I'm struggling to find decent
documentation on the metrics themselves. I was using the resources metrics
reporting - metrics-api
<https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api>
 and monitoring solr with prometheus and grafana
<https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html>
but
there is a lack of information on most metrics.

For example:

"ADMIN./admin/collections.totalTime":6715327903,

I understand this is a counter, however, I'm not sure what unit this would
be represented when displaying it, for example:

[image: image.png]

A latency of 1mil, not sure if this means milliseconds, million, etc.,
Another example would be the GC metrics:

  "gc.ConcurrentMarkSweep.count":7,
  "gc.ConcurrentMarkSweep.time":1247,
  "gc.ParNew.count":16759,
  "gc.ParNew.time":884173,

Which when displayed, doesn't give the clearest insight as to what the unit is:

[image: image.png]

If anyone has any advice / guidance, that would be greatly
appreciated. If there isn't documentation for the API, then this would
also be something I'll look into help contributing with too.

Thanks,

-- 

Richard Goodman


Unified highlighter on result of query with two required terms which matched separate fields

2019-07-25 Thread Richard Walker
Hi, I'm trying to understand what's going on with
the combination of:

* Solr 8.1.1
* edismax parser
* qf with multiple fields specified (each of which has type
  text_en_splitting, some of which are multiValued)
* unified highlight method
* query with two terms
* results where the two terms match against _separate_ fields

when I make both of the two query terms _required_.

(Sample values for the query parameters to start with:
"q":"scope national"
"fl":"id,last_updated,slug,status,title,acronym,publisher,description,widgetable,sissvoc_endpoint,owner,[explain]"
"defType":"edismax"
"qf":"title_search^1 subject_search^0.5 description^0.01 concept_search^0.5 
publisher_search^0.5"
"hl":"on"
"hl.fl":"*"
"hl.method":"unified"
"hl.snippets":"10"
)

So far, so good: results are correct, and highlighting is correct.
In particular, for a result in which there is a match
for "scope" in one field (concept_search) and for "national"
in another (publisher_search), I get a highlighting result for
"scope" in concept_search and for "national" in publisher_search.
(I also get a highlight for another field concept_phrase which
has the same content as concept_search but with string type.)

All good so far.

But now if I change the query from

"q":"scope national"

to

"q":"+scope +national"

my results still (correctly) include the result in which there
was a match for "scope" in one field (concept_search) and for "national"
in another (publisher_search), but now there are no _highlights_
for that result!

What is even more counterintuitive is that if I now also set
"hl.requireFieldMatch":"true"
the highlights for the concept_search and publisher_search fields
(but not the concept_phrase field) come back!

Richard.



/get handler on simple collection alias only checking first collection in list

2019-07-22 Thread Richard Jones
On 8.1.1 and 7.7.2 I have a simple collection alias where /select is
working as expected, but /get is only checking the first underlying
collection for documents.

Is this expected behaviour?

My routing field is a string so I cannot use complex routing for
aliases. I wanted to see if it's a known bug or a known feature.

Steps to reproduce :

Create the collections :

$ /opt/solr/bin/solr create_collection -c first
...
Created collection 'first' with 1 shard(s), 1 replica(s) with config-set 'first'
$ /opt/solr/bin/solr create_collection -c second
...
Created collection 'second' with 1 shard(s), 1 replica(s) with
config-set 'second'

Create the alias :

$ curl 
'localhost:8983/solr/admin/collections?action=CREATEALIAS=simplealias=first,second'
{
  "responseHeader":{
"status":0,
"QTime":133}}

Insert the test documents :

$ curl -d '[{"id":"1","site":"first"}]'
'localhost:8983/solr/first/update?commit=true' -H
"content-type:application/json"
{
  "responseHeader":{
"rf":1,
"status":0,
"QTime":235}}
$ curl -d '[{"id":"2","site":"second"}]'
'localhost:8983/solr/second/update?commit=true' -H
"content-type:application/json"
{
  "responseHeader":{
"rf":1,
"status":0,
"QTime":163}}

Select from the alias :

$ curl 'localhost:8983/solr/simplealias/select/?q=*:*'
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":58,
"params":{
  "q":"*:*"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
  {
"id":"1",
"site":["first"],
"_version_":1639785169851252736},
  {
"id":"2",
"site":["second"],
"_version_":1639785186243641344}]
  }}

Get from the first underlying collection :

$ curl 'localhost:8983/solr/simplealias/get?id=1'
{
  "doc":
  {
"id":"1",
"site":["first"],
"_version_":1639785169851252736}}

Get from the second underlying collection :

$ curl 'localhost:8983/solr/simplealias/get?id=2'
{
  "doc":null}

Specifying the shard of the second collection yields a result,
indicating this is a routing issue? :

$ curl 
'localhost:8983/solr/simplealias/get?id=2=all=http://10.1.0.128:8983/solr/second_shard1_replica_n1/'
{
  "doc":
  {
"id":"2",
"site":["second"],
"_version_":1639785186243641344}}


Thanks,
Rich


Re: Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

2019-07-21 Thread Richard Walker
On 22 Jul 2019, at 11:32 am, Richard Walker  wrote:
> I'm trying out the advice in the user guide
> ( 
> https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
>  )
> for using the unified highlighter.
> 
> ...
> * "set storeOffsetsWithPositions to true"
> * "set termVectors to true but no other term vector
>  related options on the field being highlighted"
...

I completely forgot to mention that I also tried _just_:

> * "set storeOffsetsWithPositions to true"

i.e., without _also_ setting termVectors, and this _doesn't_
give the exception.

So it seems to be the _combination_ of:
* unified highlighter
* storeOffsetsWithPositions
* termVectors

that seems to be giving the exception.



Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

2019-07-21 Thread Richard Walker
I'm trying out the advice in the user guide
( 
https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
 )
for using the unified highlighter.

I saw the note:
"This is definitely the fastest option for highlighting
wildcard queries on large text fields."

and decided to try this, namely:

* "set storeOffsetsWithPositions to true"
* "set termVectors to true but no other term vector
  related options on the field being highlighted"

I've set these options on two fields, but I now get an
exception during highlighting of the results of a phrase query.
(I'm not even testing with wildcards yet.)

Here's an extract of the schema before making the change:

  
  
  
  
  
  

And here are the only two lines I changed:

  
  

Here's a sample minimal query that worked perfectly before making the change:

defType=edismax
q="space administration"
fl=id,title
qf=fulltext concept_search
hl=true
hl.method=unified
hl.fl=*

After making the change to the schema, I now get this exception in the Solr log:

o.a.s.s.HttpSolrCall null:java.lang.IllegalStateException: field "fulltext" was 
indexed without position data; cannot run PhraseQuery (phrase=fulltext:"space 
administr")
at 
org.apache.lucene.search.PhraseQuery$1.getPhraseMatcher(PhraseQuery.java:446)
at 
org.apache.lucene.search.PhraseWeight.lambda$matches$0(PhraseWeight.java:89)
at org.apache.lucene.search.MatchesUtils.forField(MatchesUtils.java:101)
at org.apache.lucene.search.PhraseWeight.matches(PhraseWeight.java:88)
at 
org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.matches(DisjunctionMaxQuery.java:125)
at 
org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:138)
at 
org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
at 
org.apache.lucene.search.uhighlight.TermVectorOffsetStrategy.getOffsetsEnum(TermVectorOffsetStrategy.java:49)
at 
org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:639)
at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:508)
at 
org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)
etc.

The response includes search results, but no highlighting information.

Of interest is that the exception is against the field "fulltext",
whose definition I _didn't_ change.

If I remove the "fulltext" field from qf, so that the query is now this:

defType=edismax
q="space administration"
fl=id,title
qf=concept_search
hl=true
hl.method=unified
hl.fl=*

the log now has this exception:

o.a.s.s.HttpSolrCall null:java.lang.IllegalStateException: field 
"concept_search" was indexed without position data; cannot run PhraseQuery 
(phrase=concept_search:"space administr")
at 
org.apache.lucene.search.PhraseQuery$1.getPhraseMatcher(PhraseQuery.java:446)
at 
org.apache.lucene.search.PhraseWeight.lambda$matches$0(PhraseWeight.java:89)
at org.apache.lucene.search.MatchesUtils.forField(MatchesUtils.java:101)
at org.apache.lucene.search.PhraseWeight.matches(PhraseWeight.java:88)
at 
org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:138)
at 
org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
at 
org.apache.lucene.search.uhighlight.TermVectorOffsetStrategy.getOffsetsEnum(TermVectorOffsetStrategy.java:49)
at 
org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:639)
at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:508)
at 
org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at 

Re: Upload/use a plugin JAR in ZooKeeper

2019-07-18 Thread Richard Walker
On 19 Jul 2019, at 12:02 pm, Chee Yee Lim  wrote:
> Not sure if this is the recommended way, but I managed to use plugin JARs
> with Solr Cloud.
> 
> Either include the absolute path to JAR in solrconfig.xml, or put the JAR
> in a "lib" folder relative to your instanceDir. See the following text from
> solrconfig.xml.

As I already noted in my original message of 16 July:

> I've been able to get this to work the "simple" way,
> by putting the JAR in the file system, and specifying
> basic
> 
>  
>  
> 
> values in solrconfig.xml. No problem doing it this way.

... and that this is precisely what I do _not_ want to do,
unless I have to.

I want to use a JAR file uploaded to the collection's znode,
as the user guide strongly suggests is possible.
(And also again, no, I don't want to configure/use the Blob Store.)



Re: Upload/use a plugin JAR in ZooKeeper

2019-07-18 Thread Richard Walker
On 16 Jul 2019, at 4:14 pm, Richard Walker  wrote:
> ...
> 
> To be specific, I'm trying to use this idea:
> 
> "Resources and plugins may be stored:
> • in ZooKeeper under a collection’s configset node (SolrCloud only);"
> 
> ...
> 
> So far, so good. But now how do I refer to the JAR in solrconfig.xml?
> The user guide doesn't really say.
> 
> ...
> 
> No success at all; I only get a ClassNotFoundException
> for the plugin class.
> 
> ...

I've now found this earlier thread:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201701.mbox/%3ccakhkodqv-y59+7m86ogvf1feqj6ieiogp8trhl1mg5fuajl...@mail.gmail.com%3e

in which the second message (from Shawn Heisey) says:

> I actually do not know what the path for lib directives is relative to
> when running SolrCloud.  Most things in a core config are relative to
> the location of the config file itself, but in this case, the config
> file is not on the filesystem at all, it's in zookeeper, and I don't
> think Solr can use jars in zookeeper.  

So is this the definitive answer? As I suggested in my
earlier message, the documentation in the user guide at
https://lucene.apache.org/solr/guide/8_1/resource-and-plugin-loading.html
strongly suggests that you _can_ use plugin JARs uploaded
to a collection's znode.

Richard.



HowtoConfigureIntelliJ link is broken

2019-07-18 Thread Richard Goodman
Hi there,

I went to set up the repo with intellij, but it was having some problems
figuring out the source folders etc., So I went to navigate to the
following link <https://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ>
as I remember from the past there were a few commands that helped, however,
it appears to be broken? I used a website archiver to retrieve the original
contents, but wasn't sure if it had been raised.

Thanks,

-- 

Richard Goodman|Data Infrastructure engineer

richa...@brandwatch.com


NEW YORK   | BOSTON   | BRIGHTON   | LONDON   | BERLIN |   STUTTGART |
PARIS   | SINGAPORE | SYDNEY

<https://www.brandwatch.com/blog/digital-consumer-intelligence/>


Upload/use a plugin JAR in ZooKeeper

2019-07-16 Thread Richard Walker
Hi, I'm trying to use a plugin JAR containing
a custom query parser.

I've been able to get this to work the "simple" way,
by putting the JAR in the file system, and specifying
basic

  
  

values in solrconfig.xml. No problem doing it this way.

But I'm running in SolrCloud mode and I'd like to take
advantage of an option that the user guide seems to offer
at this page:

https://lucene.apache.org/solr/guide/8_1/resource-and-plugin-loading.html

But, so far, I don't see how to make it work.

To be specific, I'm trying to use this idea:

"Resources and plugins may be stored:
• in ZooKeeper under a collection’s configset node (SolrCloud only);"

Note: I'm _not_ trying to do the _third_ option listed, i.e.,
"• in Solr’s Blob Store (SolrCloud only)", that uses
the ".system" collection.

The user guide seems to suggest that I can upload the JAR
to the collection's config using zk cp:
"To upload a plugin or resource to a configset
already stored on ZooKeeper, you can use bin/solr zk cp."

So, I've used zk cp to upload the JAR to
zk:/configs/my_collection/my_plugin.jar
(I also tried various other subdirectories such as
zk:/configs/my_collection/lib/my_plugin.jar)

So far, so good. But now how do I refer to the JAR in solrconfig.xml?
The user guide doesn't really say.

I've tried specifying the location of the JAR
with various values of  element.

No success at all; I only get a ClassNotFoundException
for the plugin class.

Could someone please tell me what I'm missing, i.e., what
I need to do to use a plugin JAR stored
"in ZooKeeper under a collection’s configset node"?

Richard.



Suggestions API for system properties

2019-04-02 Thread Richard Goodman
Hi there,

I have been slowly building up my cluster policies to re-produce the rules
we set on our collections when they are created *(pre v7)*. One of these
rules was rack awareness, which I've implemented by the following:

{
  "replica": "#EQUAL",
  "shard": "#EACH",
  "sysprop.racklocation": "#EACH"
}

Whilst this works and shows valid violations as shown below:

"collection":"collection_name",
"shard":"shard2",
"tagKey":"/rack/001",
"violation":{
  "replica":{
"NRT":2,
"count":2},
  "delta":1.0},
"clause":{
  "replica":"#EQUAL",
  "shard":"#EACH",
  "sysprop.racklocation":"#EACH",
  "collection":"collection_name"},
"violatingReplicas":[{
"core_node45":{
  "core":"collection_name_shard2_replica3",
  "shard":"shard2",
  "collection":"collection_name",
  "node_name":"127.0.0.1:8080_solr",
  "type":"NRT",
  "base_url":"http://127.0.0.1:8080/solr;,
  "state":"active",
  "force_set_state":"false",
  "INDEX.sizeInGB":12.280795318074524}}
,{
"core_node2":{
  "core":"collection_name_shard2_replica1",
  "shard":"shard2",
  "collection":"collection_name",
  "node_name":"127.0.0.2:8083_solr",
  "type":"NRT",
  "leader":"true",
  "base_url":"http://127.0.0.2:8083/solr;,
  "state":"active",
  "force_set_state":"false",
  "INDEX.sizeInGB":12.24499356560409}}]},

As you can see there are two replicas which are on rack "rack/001" which
isn't allowed. However when going onto the /autoscaling/suggestions
endpoint, nothing is returned:
{
"responseHeader":{
"status":0,
"QTime":43848},
"suggestions":[],
"WARNING":"This response format is experimental. It is likely to change in
the future."}

I experimented by explicitly stating the racks that are present in the
cluster, i.e.
{
  "replica": "#EQUAL",
  "shard": "#EACH",
  "sysprop.racklocation": ["/rack/001", "/rack/002", "/rack/003",
"/rack/004"]
}

With hopes that Solr would be able to use this to deduce where to place
violating replicas, however, this still doesn't work.

I was wondering if anyone had any similar experience with using system
properties for cluster policies, and how it affects the suggestions
endpoint, as I'm having difficulty of getting results from this.

Cheers,
Richard Goodman


Re: Autoscaling rack awareness

2019-03-27 Thread Richard Goodman
So I managed to get this working by the following policy:

{"replica":"<2","shard":"#EACH","sysprop.racklocation": "#EACH"}


On Tue, 26 Mar 2019 at 14:03, Richard Goodman 
wrote:

> Hi, I'm currently running into some trouble trying to set up rack
> awareness as a cluster policy.
>
> I run my cluster with 3 way replication, currently a few collection-shards
> have 4 replicas, which shows as violations under my current set policies:
>
> {
> "set-cluster-policy":[
> {
> "replica":"<2",
> "shard":"#EACH",
> "node":"#ANY"
> },
> {
> "replica":0,
> "freedisk":"<50",
> "strict":false
> }
> ]
> }
>
> {
> "collection":"collection_name_one",
> "shard":"shard12",
> "node":"1.2.3.4:8080_solr",
> "tagKey":"1.2.3.4:8080_solr",
> "violation":{
>
> "replica":"org.apache.solr.client.solrj.cloud.autoscaling.ReplicaCount:{\n
> \"NRT\":2,\n  \"PULL\":0,\n  \"TLOG\":0,\n  \"count\":2}",
>   "delta":1},
> "clause":{
>   "replica":"<2",
>   "shard":"#EACH",
>   "node":"#ANY",
>   "collection":"collection_name_one"}
> },
>
> I want to implement rack awareness as a policy, there are examples of
> availability zone policies, however, not really anything for rack
> awareness. Currently we set this when creating a collection:
>
> sysprop.racklocation:*,shard:*,replica:<2
>
> So I tried to implement this via the following policy rule
>
> {"replica": "<2", "shard": "#EACH", "sysprop.racklocation": "*"}
>
> However, this hasn't worked *(because with the extra replication I have
> atm, it would certainly raise this as a violation)*, so not sure how I
> can implement this?
> I saw in the 7.7 docs this following example:
> {"replica":"#ALL", "shard":"shard1", "sysprop.rack":"730"}
> However, this forces shard 1 of all replicas to belong to a certain rack,
> which I don't want to do, I'd rather the replicas have free choice of where
> they are placed, providing if two replicas appear on the same racklocation,
> it would raise a violation.
>
> Has anyone had experience of setting something like this up, or have any
> advice / see an error in my policy set up?
>
> *(Currently running solr 7.4)*
>
> Thanks,
> Richard
>


-- 

Richard Goodman|Data Infrastructure Engineer

richa...@brandwatch.com


NEW YORK   | BOSTON  | BRIGHTON   | LONDON   | BERLIN   |   STUTTGART   |
SINGAPORE   | SYDNEY | PARIS


<https://www.brandwatch.com/blog/brandwatch-and-crimson-hexagon/>


Autoscaling rack awareness

2019-03-26 Thread Richard Goodman
Hi, I'm currently running into some trouble trying to set up rack awareness
as a cluster policy.

I run my cluster with 3 way replication, currently a few collection-shards
have 4 replicas, which shows as violations under my current set policies:

{
"set-cluster-policy":[
{
"replica":"<2",
"shard":"#EACH",
"node":"#ANY"
},
{
"replica":0,
"freedisk":"<50",
"strict":false
}
]
}

{
"collection":"collection_name_one",
"shard":"shard12",
"node":"1.2.3.4:8080_solr",
"tagKey":"1.2.3.4:8080_solr",
"violation":{

"replica":"org.apache.solr.client.solrj.cloud.autoscaling.ReplicaCount:{\n
\"NRT\":2,\n  \"PULL\":0,\n  \"TLOG\":0,\n  \"count\":2}",
  "delta":1},
"clause":{
  "replica":"<2",
  "shard":"#EACH",
  "node":"#ANY",
  "collection":"collection_name_one"}
},

I want to implement rack awareness as a policy, there are examples of
availability zone policies, however, not really anything for rack
awareness. Currently we set this when creating a collection:

sysprop.racklocation:*,shard:*,replica:<2

So I tried to implement this via the following policy rule

{"replica": "<2", "shard": "#EACH", "sysprop.racklocation": "*"}

However, this hasn't worked *(because with the extra replication I have
atm, it would certainly raise this as a violation)*, so not sure how I can
implement this?
I saw in the 7.7 docs this following example:
{"replica":"#ALL", "shard":"shard1", "sysprop.rack":"730"}
However, this forces shard 1 of all replicas to belong to a certain rack,
which I don't want to do, I'd rather the replicas have free choice of where
they are placed, providing if two replicas appear on the same racklocation,
it would raise a violation.

Has anyone had experience of setting something like this up, or have any
advice / see an error in my policy set up?

*(Currently running solr 7.4)*

Thanks,
Richard


Re: HOW DO I UNSUBSCRIBE FROM GROUP?

2017-10-16 Thread Richard
The list help/unsubscribe/post/etc. details are, as is not uncommon,
in the message header:

  List-Help: 
  List-Unsubscribe: 
  List-Post: 

of all messages posted to the list.


 Original Message 
> Date: Monday, October 16, 2017 09:16:08 -0400
> From: Gus Heck 
> To: solr-user@lucene.apache.org
> Subject: Re: HOW DO I UNSUBSCRIBE FROM GROUP?
>
> While this has been the traditional response, and it's accurate and
> helpful, the user that complained about no unsubscribe link has a
> point. This is the normal expectation in this day and age. Maybe
> Apache should consider appending a "You are receiving this because
> you are subscribed to (list) click here to unsubscribe" line, but I
> know that if I hadn't been dealing with various apache mailing
> lists on and off for 15 years and I found that I was getting emails
> with no unsubscribe links in undesired quantities, the spam bucket
> would be my answer (probably never send the email asking for how to
> unsubscribe). That's certainly the policy I use for any marketing
> type mails (no unsubscribe == spam bucket)... A simple unsubscribe
> tagline could help us not get tagged as spam, and avoid this type
> of email (which has been a regular occurrence for 15 years)
> 
> -Gus
> 
> Hi,
>> 
>> If you wish the emails to "stop", kindly "UNSUBSCRIBE"  by
>> following the instructions on the
>> http://lucene.apache.org/solr/community.html. Hope this
>> helps.
>> 
>> 

 End Original Message 




Partial Field Update "removeregex" Command

2016-12-07 Thread Richard Bergmann
Hello,

I am new to this and have found no examples or guidance on how to use 
"removeregex" to remove (in my case) all entries in a multi-valued field.

The following curl commands work just fine:

curl . . . -d '[{"id":"docId","someField":{"add",["val1","val2"]}}]'

and

curl . . . -d '[{"id":"docId","someField":{"remove",["val1","val2"]}}]'


None of the following have any effect, however:

curl . . . -d '[{"id":"docId","someField":{"removeregex","val1"}}]'

curl . . . -d '[{"id":"docId","someField":{"removeregex",".*"}}]'


I appreciate your help and hope your answer makes it out on the cloud somewhere 
so others can find the solution.

Regards,

Rich Bergmann


Re: Geo Aggregations and Search Alerts in Solr

2015-02-24 Thread Richard Gibbs
Hi Charlie,

Thanks a lot for your response

On Tue, Feb 24, 2015 at 5:08 PM, Charlie Hull char...@flax.co.uk wrote:

 On 24/02/2015 03:03, Richard Gibbs wrote:

 Hi There,

 I am in the process of choosing a search technology for one of my projects
 and I was looking into Solr and Elasticsearch.

 Two features that I am more interested are geo aggregations (for map
 clustering) and search alerts. Elasticsearch seem to have these two
 features built-in.

 http://www.elasticsearch.org/guide/en/elasticsearch/guide/
 current/geo-aggs.html
 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/search-percolate.html

 I couldn't find relevant documentation for Solr and therefore not sure
 whether these features are readily available in Solr. Can you please let
 me
 know whether these features are available in Solr? If not, whether there
 are solutions to achieve same with Solr.


 Hi Richard,

 I don't know about geo aggregations, although I know the Heliosearch guys
 and others have been working on various facet statistics that may impinge
 on this. http://heliosearch.org/solr-facet-functions/

 For alerting, you're talking about storing queries and running them
 against any new document to see if it matches. We do this a lot for clients
 needing large scale media monitoring and auto-classification - here's the
 Lucene-based library we released:
 https://github.com/flaxsearch/luwak
 Note that this depends on a patched Lucene currently, but I'm very happy
 to say that a client is funding us to merge this back to trunk and we
 expect Luwak to be be able to a 5.x release of Lucene. More news very soon!
 There are a couple of videos on that page that will explain further. We
 suspect our approach is considerably faster than the Percolator, and it's
 on the list to benchmark the two.

 Cheers

 Charlie


 Thank you.



 --
 Charlie Hull
 Flax - Open Source Enterprise Search

 tel/fax: +44 (0)8700 118334
 mobile:  +44 (0)7767 825828
 web: www.flax.co.uk



Geo Aggregations and Search Alerts in Solr

2015-02-23 Thread Richard Gibbs
Hi There,

I am in the process of choosing a search technology for one of my projects
and I was looking into Solr and Elasticsearch.

Two features that I am more interested are geo aggregations (for map
clustering) and search alerts. Elasticsearch seem to have these two
features built-in.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/geo-aggs.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html

I couldn't find relevant documentation for Solr and therefore not sure
whether these features are readily available in Solr. Can you please let me
know whether these features are available in Solr? If not, whether there
are solutions to achieve same with Solr.

Thank you.


remove requests [Mongo DB Users]

2014-09-15 Thread Richard
Folks -- These remove requests (more than a half dozen so far) are
rather foolish.

The person who sent the original Mongo DB Users message simply
spammed the solr-user@lucene ... list, they didn't send to
individual user addresses. So, if your remove request is intended
for the spammer, it's fairly meaningless as (because of the list
reply-to) you're replying to/re-spamming the whole list, not sending
to the original spammer (who doesn't have your individual address on
a list in the first place).

If you want off the solr list, use the instructions found in the
headers of all list messages:

 List-Help: mailto:solr-user-h...@lucene.apache.org
 List-Unsubscribe: mailto:solr-user-unsubscr...@lucene.apache.org

This list is not filtered by topic, so you either get all the
messages, or none -- i.e., there is no concept of being dropped from
a specific thread (as at least one person requested).

[My apologies to everyone else for this message -- just trying to
stem the flow before it becomes an epidemic.]

   - Richard



 Original Message 
 Date: Monday, September 15, 2014 21:30:35 +
 From: Michael Beccaria mbecca...@paulsmiths.edu
 To: solr-user@lucene.apache.org
 Subject: RE: Mongo DB Users

 Remove
 
 From: Aaron Susan aaronsus...@gmail.com
 Sent: Monday, September 15, 2014 11:35 AM
 To: Aaron Susan
 Subject: Mongo DB Users
 
 Hi,
 
 I am here to inform you that we are having a contact list of
 *Mongo DB Users *would you be interested in it?
 
 Data Field’s Consist Of: Name, Job Title, Verified Phone Number,
 Verified Email Address, Company Name  Address Employee Size,
 Revenue size, SIC Code, Industry Type etc.,
 
 We also provide other technology users as well depends on your
 requirement.
 
 For Example:
 
 
 *Red Hat *
 
 *Terra data *
 
 *Net-app *
 
 *NuoDB*
 
 *MongoHQ ** and many more*
 
 
 We also provide IT Decision Makers, Sales and Marketing Decision
 Makers, C-level Titles and other titles as per your requirement.
 
 Please review and let me know your interest if you are looking for
 above mentioned users list or other contacts list for your
 campaigns.
 
 Waiting for a positive response!
 
 Thanks
 
 *Aaron Susan*
 Data Specialist
 
 If you are not the right person, feel free to forward this email
 to the right person in your organization. To opt out response
 Remove

 End Original Message 




Re: Delta import throws java heap space exception

2014-03-13 Thread Richard Marquina Lopez
Hi Furkan,

sure, this is my data-config.xml:

  dataConfig
  document
entity name=item pk=id dataSource=store_db onError=skip
query=SELECT IT.* FROM item AS IT JOIN order AS ORD ON
IT.order_id=ORD.id WHERE (IT.status=1 AND ORD.status=1)
deltaQuery=SELECT IT.* FROM item IT, order ORD, customer CUST
WHERE IT.order_id1=ORD.id AND ORD.customer_id=CUST.id AND
(IT.last_modified_date gt; DATE_SUB('${dataimporter.last_index_time}',
INTERVAL 1 MINUTE) OR ORD.last_modified_date gt;
DATE_SUB('${dataimporter.last_index_time}', INTERVAL 1 MINUTE) OR
CUST.last_modified_date gt; DATE_SUB('${dataimporter.last_index_time}',
INTERVAL 1 MINUTE))
deltaImportQuery=SELECT * FROM item WHERE id='${
dataimporter.delta.id}'
deletedPkQuery=SELECT IT.id FROM item AS IT JOIN order AS ORD ON
IT.order_id=ORD.id WHERE (IT.status!=1 OR ORD.status!=1) AND
(IT.last_modified_date gt; DATE_SUB('${dataimporter.last_index_time}',
INTERVAL 1 MINUTE) OR ORD.last_modified_date gt;
DATE_SUB('${dataimporter.last_index_time}', INTERVAL 1 MINUTE))

  field column=id name=id/
  ...

  entity name=product dataSource=store_db query=SELECT PR.name,
PR.description FROM product PR, item IT WHERE IT.product_id = PR.id AND
IT.id='${item.id}' 
field column=name name=name/
...
  /entity
  ...
/entity
  /document
/dataConfig

Currently I have 2.1 Million of activities.

Thanks a lot,
Richard


2014-03-12 19:16 GMT-04:00 Furkan KAMACI furkankam...@gmail.com:

 Hi;

 Could you send your data-config.xml?

 Thanks;
 Furkan KAMACI


 2014-03-13 1:01 GMT+02:00 Richard Marquina Lopez 
 richard.marqu...@gmail.com
 :

  Hi Ahmet,
 
  Thank you for your response, currently I have the next configuration for
  JVM:
  -XX:+PrintGCDetails-XX:-UseParallelGC-XX:SurvivorRatio=8-XX:NewRatio=2
  -XX:+HeapDumpOnOutOfMemoryError-XX:PermSize=128m-XX:MaxPermSize=256m
  -Xms1024m-Xmx2048m
  I have 3.67 GB of physical RAM and 2GB is asigned to JVM (-Xmx2048m)
 
 
  2014-03-12 17:32 GMT-04:00 Ahmet Arslan iori...@yahoo.com:
 
   Hi Richard,
  
   How much ram do you assign to java heap? Try increasing it to 1 gb for
   example.
   Please see : https://wiki.apache.org/solr/ShawnHeisey
  
   Ahmet
  
  
  
   On Wednesday, March 12, 2014 10:53 PM, Richard Marquina Lopez 
   richard.marqu...@gmail.com wrote:
  
   Hi,
  
   I have some problems when execute the delta import with 2 million of
 rows
   from mysql database:
  
   java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57)
   at java.nio.CharBuffer.allocate(CharBuffer.java:331)
   at
  java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
   at java.nio.charset.Charset.decode(Charset.java:810)
   at com.mysql.jdbc.StringUtils.toString(StringUtils.java:2010)
   at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:820)
   at com.mysql.jdbc.BufferRow.getString(BufferRow.java:541)
   at com.mysql.jdbc.ResultSetImpl.getStringInternal(
   ResultSetImpl.java:5812)
   at
  com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5689)
   at
  com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4986)
   at
  com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5175)
   at org.apache.solr.handler.dataimport.JdbcDataSource$
   ResultSetIterator.getARow(JdbcDataSource.java:315)
   at org.apache.solr.handler.dataimport.JdbcDataSource$
   ResultSetIterator.access$700(JdbcDataSource.java:254)
   at org.apache.solr.handler.dataimport.JdbcDataSource$
   ResultSetIterator$1.next(JdbcDataSource.java:294)
   at org.apache.solr.handler.dataimport.JdbcDataSource$
   ResultSetIterator$1.next(JdbcDataSource.java:286)
   at
  org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(
   EntityProcessorBase.java:117)
   at org.apache.solr.handler.dataimport.SqlEntityProcessor.
   nextModifiedRowKey(SqlEntityProcessor.java:86)
   at org.apache.solr.handler.dataimport.EntityProcessorWrapper.
   nextModifiedRowKey(EntityProcessorWrapper.java:267)
   at org.apache.solr.handler.dataimport.DocBuilder.
   collectDelta(DocBuilder.java:781)
   at org.apache.solr.handler.dataimport.DocBuilder.doDelta(
   DocBuilder.java:338)
   at org.apache.solr.handler.dataimport.DocBuilder.execute(
   DocBuilder.java:223)
   at org.apache.solr.handler.dataimport.DataImporter.
   doDeltaImport(DataImporter.java:440)
   at org.apache.solr.handler.dataimport.DataImporter.
   runCmd(DataImporter.java:478)
   at org.apache.solr.handler.dataimport.DataImporter$1.run(
   DataImporter.java:457)
   
  
 
 --
  
   java.sql.SQLException: Streaming result set
   com.mysql.jdbc.RowDataDynamic@47a034e7
   is still active

Delta import throws java heap space exception

2014-03-12 Thread Richard Marquina Lopez
Hi,

I have some problems when execute the delta import with 2 million of rows
from mysql database:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:331)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
at java.nio.charset.Charset.decode(Charset.java:810)
at com.mysql.jdbc.StringUtils.toString(StringUtils.java:2010)
at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:820)
at com.mysql.jdbc.BufferRow.getString(BufferRow.java:541)
at com.mysql.jdbc.ResultSetImpl.getStringInternal(
ResultSetImpl.java:5812)
at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5689)
at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4986)
at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5175)
at org.apache.solr.handler.dataimport.JdbcDataSource$
ResultSetIterator.getARow(JdbcDataSource.java:315)
at org.apache.solr.handler.dataimport.JdbcDataSource$
ResultSetIterator.access$700(JdbcDataSource.java:254)
at org.apache.solr.handler.dataimport.JdbcDataSource$
ResultSetIterator$1.next(JdbcDataSource.java:294)
at org.apache.solr.handler.dataimport.JdbcDataSource$
ResultSetIterator$1.next(JdbcDataSource.java:286)
at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(
EntityProcessorBase.java:117)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.
nextModifiedRowKey(SqlEntityProcessor.java:86)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.
nextModifiedRowKey(EntityProcessorWrapper.java:267)
at org.apache.solr.handler.dataimport.DocBuilder.
collectDelta(DocBuilder.java:781)
at org.apache.solr.handler.dataimport.DocBuilder.doDelta(
DocBuilder.java:338)
at org.apache.solr.handler.dataimport.DocBuilder.execute(
DocBuilder.java:223)
at org.apache.solr.handler.dataimport.DataImporter.
doDeltaImport(DataImporter.java:440)
at org.apache.solr.handler.dataimport.DataImporter.
runCmd(DataImporter.java:478)
at org.apache.solr.handler.dataimport.DataImporter$1.run(
DataImporter.java:457)

--

java.sql.SQLException: Streaming result set
com.mysql.jdbc.RowDataDynamic@47a034e7
is still active.
No statements may be issued when any streaming result sets are open and in
use on a given connection.
Ensure that you have called .close() on any active streaming result sets
before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924)
at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingDa
ta(MysqlIO.java:3361)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2828)
at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(
ConnectionImpl.java:5204)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5087)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4690)
at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1649)
at org.apache.solr.handler.dataimport.JdbcDataSource.
closeConnection(JdbcDataSource.java:436)
at org.apache.solr.handler.dataimport.JdbcDataSource.
close(JdbcDataSource.java:421)
at org.apache.solr.handler.dataimport.DocBuilder.
closeEntityProcessorWrappers(DocBuilder.java:288)
at org.apache.solr.handler.dataimport.DocBuilder.execute(
DocBuilder.java:277)
at org.apache.solr.handler.dataimport.DataImporter.
doDeltaImport(DataImporter.java:440)
at org.apache.solr.handler.dataimport.DataImporter.
runCmd(DataImporter.java:478)
at org.apache.solr.handler.dataimport.DataImporter$1.run(
DataImporter.java:457)

Currently I have the batchSize parameter stetted to -1

Configuration:
- SOLR 4.4
- Centos 5.5
- 2GB RAM
- 1 Procesosr

Does someone have the same error?
Could someone help me, please?

Thank you,
Richard


Re: Delta import throws java heap space exception

2014-03-12 Thread Richard Marquina Lopez
Hi Ahmet,

Thank you for your response, currently I have the next configuration for
JVM:
-XX:+PrintGCDetails-XX:-UseParallelGC-XX:SurvivorRatio=8-XX:NewRatio=2
-XX:+HeapDumpOnOutOfMemoryError-XX:PermSize=128m-XX:MaxPermSize=256m
-Xms1024m-Xmx2048m
I have 3.67 GB of physical RAM and 2GB is asigned to JVM (-Xmx2048m)


2014-03-12 17:32 GMT-04:00 Ahmet Arslan iori...@yahoo.com:

 Hi Richard,

 How much ram do you assign to java heap? Try increasing it to 1 gb for
 example.
 Please see : https://wiki.apache.org/solr/ShawnHeisey

 Ahmet



 On Wednesday, March 12, 2014 10:53 PM, Richard Marquina Lopez 
 richard.marqu...@gmail.com wrote:

 Hi,

 I have some problems when execute the delta import with 2 million of rows
 from mysql database:

 java.lang.OutOfMemoryError: Java heap space
 at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57)
 at java.nio.CharBuffer.allocate(CharBuffer.java:331)
 at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
 at java.nio.charset.Charset.decode(Charset.java:810)
 at com.mysql.jdbc.StringUtils.toString(StringUtils.java:2010)
 at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:820)
 at com.mysql.jdbc.BufferRow.getString(BufferRow.java:541)
 at com.mysql.jdbc.ResultSetImpl.getStringInternal(
 ResultSetImpl.java:5812)
 at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5689)
 at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4986)
 at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5175)
 at org.apache.solr.handler.dataimport.JdbcDataSource$
 ResultSetIterator.getARow(JdbcDataSource.java:315)
 at org.apache.solr.handler.dataimport.JdbcDataSource$
 ResultSetIterator.access$700(JdbcDataSource.java:254)
 at org.apache.solr.handler.dataimport.JdbcDataSource$
 ResultSetIterator$1.next(JdbcDataSource.java:294)
 at org.apache.solr.handler.dataimport.JdbcDataSource$
 ResultSetIterator$1.next(JdbcDataSource.java:286)
 at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(
 EntityProcessorBase.java:117)
 at org.apache.solr.handler.dataimport.SqlEntityProcessor.
 nextModifiedRowKey(SqlEntityProcessor.java:86)
 at org.apache.solr.handler.dataimport.EntityProcessorWrapper.
 nextModifiedRowKey(EntityProcessorWrapper.java:267)
 at org.apache.solr.handler.dataimport.DocBuilder.
 collectDelta(DocBuilder.java:781)
 at org.apache.solr.handler.dataimport.DocBuilder.doDelta(
 DocBuilder.java:338)
 at org.apache.solr.handler.dataimport.DocBuilder.execute(
 DocBuilder.java:223)
 at org.apache.solr.handler.dataimport.DataImporter.
 doDeltaImport(DataImporter.java:440)
 at org.apache.solr.handler.dataimport.DataImporter.
 runCmd(DataImporter.java:478)
 at org.apache.solr.handler.dataimport.DataImporter$1.run(
 DataImporter.java:457)
 
 --

 java.sql.SQLException: Streaming result set
 com.mysql.jdbc.RowDataDynamic@47a034e7
 is still active.
 No statements may be issued when any streaming result sets are open and in
 use on a given connection.
 Ensure that you have called .close() on any active streaming result sets
 before attempting more queries.
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924)
 at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingDa
 ta(MysqlIO.java:3361)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2828)
 at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(
 ConnectionImpl.java:5204)
 at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5087)
 at
 com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4690)
 at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1649)
 at org.apache.solr.handler.dataimport.JdbcDataSource.
 closeConnection(JdbcDataSource.java:436)
 at org.apache.solr.handler.dataimport.JdbcDataSource.
 close(JdbcDataSource.java:421)
 at org.apache.solr.handler.dataimport.DocBuilder.
 closeEntityProcessorWrappers(DocBuilder.java:288)
 at org.apache.solr.handler.dataimport.DocBuilder.execute(
 DocBuilder.java:277)
 at org.apache.solr.handler.dataimport.DataImporter.
 doDeltaImport(DataImporter.java:440)
 at org.apache.solr.handler.dataimport.DataImporter.
 runCmd(DataImporter.java:478)
 at org.apache.solr.handler.dataimport.DataImporter$1.run(
 DataImporter.java:457)

 Currently I have the batchSize parameter stetted to -1

 Configuration:
 - SOLR 4.4
 - Centos 5.5
 - 2GB RAM
 - 1 Procesosr

 Does someone have the same

Queries with conditional field inclusions?

2013-10-25 Thread Richard Frovarp
I'm trying to put together a query in Solr that is becoming rather 
complicated, and I'm not quite sure where to even start.


I'm building a directory, which for simplicity sake contains:
First Name
Last Name
Department Name (if faculty / staff)
User Types - faculty, staff, student - multivalued.

I want one search field to search first, last, and department. I have 
that working. However, that means you can find students using a first 
name only search, which isn't exactly desirable, but it is desirable for 
faculty and staff.


So I want:

Search Department Name + Last Name every time
include First Name if user type in (faculty, staff) or if another token 
matched last name.


So searching for richard would only work for me if I'm marked as 
faculty or staff. However searching for frovarp richard would limit to 
my record if I was marked as a student as the frovarp piece would match 
against last names.


Any suggestions or ideas? I'm testing against Solr 4.3.1, but will 
probably be updating to 4.5.1 anyway.


I'm open to multiple cores or searches. I actually have 4 different 
first and last name fields (full, ngram, two phonetic), so scoring 
becomes important.


Thanks,
Richard


Re: Questions developing custom functionquery

2013-10-10 Thread Richard Lee
seems what u got is the terms other than the raw data. maybe u should check
the api docs for more details
2013-10-11 上午3:56于 JT handyrems...@gmail.com写道:

 I'm running into some issues developing a custom functionquery.

 My goal is to be able to implement a custom sorting technique.

 I have a field defined called resname, it is a single value str.

 Example: str name=resname/some
 example/data/here/2013/09/12/testing.text/str

 I would like to do a custom sort based on this resname field.
 Basically, I would like to parse out that date there (2013/09/12) and sort
 on that date.


 I've followed various tutorials
- http://java.dzone.com/news/how-write-custom-solr
-
 http://www.supermind.org/blog/756/how-to-write-a-custom-solr-functionquery


 Im at the point where my code compiles, runs, executes, etc. Solr is happy
 with my code.

 I have classes that inherit from ValueSorceParser and ValueSorce, etc. I've
 overrode parse and
 instantiated my class with ValueSource

 public ValueSource parse(FunctionQParser fqp) {
 return MyCustomClass(fqp.parseValueSource)
 }

 public class MyCustomClass extends ValueSource {
 ValueSource source;

 public MyCustomClass(ValueSource source) {
 this.source = source;
 }

 public FunctionValues getValues() {
final FunctionValues sourceDV =
 source.getvalues(context,readerContext)
return new IntValues(this)
 public int intVal(int doc) {
 //parse the value of resname here
   String value = sourceDV.strVal(doc);
  ...more stuff
  }
}
}

 The issue I'm running into is that my call to sourceDV.strVal(doc) only
 returns part of the field, not all of it. It appears to be very random.

 I guess my actual question is, how do I access / reference the EXACT RAW
 value of a field, while writing a functionquery.

 Do I need to change my ValueSource to a String?, then somehow lookup the
 field name while inside my getValues call?

 Is there a way to access the raw field data , when referencing it as a
 FunctionValues?


 Maybe I'm going about this totally incorrectly?



AW: Edismax query parser and phrase queries

2012-12-03 Thread Tantius, Richard
Hi,
the use case we have in mind is that we would like to achieve exact matches for 
explicit phrases. Our users expect that an explicit phrase not only considers 
the order of terms, but also the exact wording. Therefore if we search on 
fields using a data type that is not meant performing exact matches, we need to 
change that for explicit phrases. This means in a usual query we have qf 
default fields using advanced tokenization (for query processing and indexing), 
for example like stemming via SnowballPorterFilterFactory. So our idea was to 
change the default search fields for explicit phrases to achieve exact matches, 
by using a simple data format like for example “string“ (StrField, without 
advanced options).

Extending our example from the last mail: 

qf=title text

Datatype of title, text, something like “text_advanced”:

fieldtype ...
 analyzer type=index !--(and also analyzer type=query )--
  filter class=solr.WordDelimiterFilterFactory ...
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=German2 /
...

Data type of the additional fields titleExact, textExact:
fieldType name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/

q=ran away from home Cat Dog 

-transformTo-

q=( titleExact:ran away from home OR textExact:ran away from home ) Cat Dog.

Regards,
Richard.

BINSERV
Gesellschaft für interaktive Konzepte und neue Medien mbH
Software Engineer

Gotenstr. 7-9
53175 Bonn
Tel.: +49 (0)228 / 4 22 86 - 38 
Fax.: +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.de  
Web:  www.binserv.de
  www.binforcepro.de

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den 
Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien 
öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien 
umgehend. Vielen Dank!


- Original message -
Von: Jack Krupansky [mailto:j...@basetechnology.com] 
Gesendet: Freitag, 30. November 2012 23:04
An: solr-user@lucene.apache.org
Betreff: Re: Edismax query parser and phrase queries

I don’t have a simple answer for your stated issue, but maybe part of that is 
because I’m not so sure what the exact problem/goal is. I mean, what’s so 
special about phrase queries for your app than they need distinct processing 
from individual terms?

And, ultimately, what goal are you trying to achieve? Such as, how will the 
outcome of the query affect what users see and do.

-- Jack Krupansky

From: Tantius, Richard
Sent: Friday, November 30, 2012 8:44 AM
To: solr-user@lucene.apache.org
Subject: Edismax query parser and phrase queries

Hi,

we are using the edismax query parser and execute queries on specific fields by 
using the qf option. Like others, we are facing the problem we do not want 
explicit phrase queries to be performed on some of the qf fields and also 
require additional search fields for those kind of queries.

We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.

So for example (lets assume qf=title text, we want phrase queries to be 
performed on the additional fields titleAlt textAlt ): q=ran away from home 
Cat Dog -transformTo- q=( titleAlt:ran away from home OR textAlt:ran away 
from home ) Cat Dog. Unfortunately this gets rather complicated if logic 
operators are involved within the query. Is there some kind of best practice, 
should we for example extend the query parser, or stick to our pre-processing 
approach?


Regards,
Richard.




Edismax query parser and phrase queries

2012-11-30 Thread Tantius, Richard
Hi,
we are using the edismax query parser and execute queries on specific fields by 
using the qf option. Like others, we are facing the problem we do not want 
explicit phrase queries to be performed on some of the qf fields and also 
require additional search fields for those kind of queries.
We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.
So for example (lets assume qf=title text, we want phrase queries to be 
performed on the additional fields titleAlt textAlt ): q=ran away from home 
Cat Dog -transformTo- q=( titleAlt:ran away from home OR textAlt:ran away 
from home ) Cat Dog. Unfortunately this gets rather complicated if logic 
operators are involved within the query. Is there some kind of best practice, 
should we for example extend the query parser, or stick to our pre-processing 
approach?

Regards,
Richard.

Richard Tantius
Software Engineer

[cid:image001.jpg@01CDCF09.3DA17860]

Gotenstr. 7-9
53175 Bonn
Tel.:+49 (0)228 / 4 22 86 - 38
Fax.:   +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.demailto:r.tant...@binserv.de
Web:  www.binserv.dehttp://www.binserv.de/
   www.binforcepro.dehttp://www.binforcepro.de/

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den 
Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien 
öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien 
umgehend. Vielen Dank!




Re: Open Source Social (London) - 23rd Oct

2012-10-21 Thread Richard Marr
Last reminder... come along on Tuesday if you can! We'd love to meet you
and share search/NLP/scaling war stories.



 On 11 October 2012 21:59, Richard Marr richard.m...@gmail.com wrote:

 Hi all,

 The next Open Source Search Social is on the 23rd Oct at The Plough, in
 Bloomsbury.

 We usually get a good mix of regulars and newcomers, and a good mix of
 backgrounds and experience levels, so please come along if you can. As
 usual the format is completely open so we'll be talking about whatever is
 most interesting at any one particular moment... ooo, a shiny thing...

 Details and RSVP options on the Meetup page:
 http://www.meetup.com/london-search-social/events/86580442/

 Hope to see you there,

 Richard

 @richmarr








-- 
Richard Marr


Re: Open Source Social (London) - 23rd Oct

2012-10-16 Thread Richard Marr
Don't forget,

The London Search Social is on Tuesday next week. Come and grab a beer with
us and talk about Search, NLP, ML, Hadoop. All experience levels welcome.



On 11 October 2012 21:59, Richard Marr richard.m...@gmail.com wrote:

 Hi all,

 The next Open Source Search Social is on the 23rd Oct at The Plough, in
 Bloomsbury.

 We usually get a good mix of regulars and newcomers, and a good mix of
 backgrounds and experience levels, so please come along if you can. As
 usual the format is completely open so we'll be talking about whatever is
 most interesting at any one particular moment... ooo, a shiny thing...

 Details and RSVP options on the Meetup page:
 http://www.meetup.com/london-search-social/events/86580442/

 Hope to see you there,

 Richard

 @richmarr



Open Source Social (London) - 23rd Oct

2012-10-11 Thread Richard Marr
Hi all,

The next Open Source Search Social is on the 23rd Oct at The Plough, in
Bloomsbury.

We usually get a good mix of regulars and newcomers, and a good mix of
backgrounds and experience levels, so please come along if you can. As
usual the format is completely open so we'll be talking about whatever is
most interesting at any one particular moment... ooo, a shiny thing...

Details and RSVP options on the Meetup page:
http://www.meetup.com/london-search-social/events/86580442/

Hope to see you there,

Richard

@richmarr


Re: edismax not working in a core

2012-07-18 Thread Richard Frovarp

On 07/18/2012 11:20 AM, Erick Erickson wrote:

the ~2 is the mm parameter I'm pretty sure. So I'd guess your configuration has
a mm parameter set on the core that isn't doing what you want..



I'm not setting the mm parameter or the q.op parameter. All three cores 
have a defaultOperator of OR. So I don't know where that would be coming 
from. However, if I specify a mm of 0, it appears to work just fine. 
I've added it as a default parameter to the select handler.


Thanks for pointing me in the right direction.

Richard


Re: edismax not working in a core

2012-07-18 Thread Richard Frovarp

On 07/18/2012 02:39 PM, Richard Frovarp wrote:

On 07/18/2012 11:20 AM, Erick Erickson wrote:

the ~2 is the mm parameter I'm pretty sure. So I'd guess your
configuration has
a mm parameter set on the core that isn't doing what you want..



I'm not setting the mm parameter or the q.op parameter. All three cores
have a defaultOperator of OR. So I don't know where that would be coming
from. However, if I specify a mm of 0, it appears to work just fine.
I've added it as a default parameter to the select handler.

Thanks for pointing me in the right direction.

Richard


Okay, that's wrong. Term boosting isn't working either, and what I did 
above just turns everything into an OR query.


I did figure out the problem, however. In the core that wasn't working, 
one of the query field names wasn't correct. No errors were ever thrown, 
it just made the query behave in a very odd way.


I finally figured it out after debugging each field independent of each 
other.


Re: edismax not working in a core

2012-07-17 Thread Richard Frovarp

On 07/14/2012 05:32 PM, Erick Erickson wrote:

Really hard to say. Try executing your query on the cores with
debugQuery=on and compare the parsed results (for this you
can probably just ignore the explain bits of the output, concentrate
on the parsed query).



Okay, for the example core from the project, the query was:

test OR samsung

parsedquery:
+(DisjunctionMaxQuery((id:test^10.0 | text:test^0.5 | cat:test^1.4 | 
manu:test^1.1 | name:test^1.2 | features:test | sku:test^1.5)) 
DisjunctionMaxQuery((id:samsung^10.0 | text:samsung^0.5 | 
cat:samsung^1.4 | manu:samsung^1.1 | name:samsung^1.2 | features:samsung 
| sku:samsung^1.5)))


For my core the query was:

frovarp OR fee

parsedquery:

+((DisjunctionMaxQuery((content:fee | title:fee^5.0 | 
mainContent:fee^2.0)) DisjunctionMaxQuery((content:frovarp | 
title:frovarp^5.0 | mainContent:frovarp^2.0)))~2)


What is that ~2? That's the difference. The third core that works 
properly also doesn't have the ~2.


edismax not working in a core

2012-07-13 Thread Richard Frovarp
I'm having trouble with edismax not working in one of my cores. I have 
three cores up and running, including the demo in Solr 3.6 on Tomcat 
7.0.27 on Java 1.6.


I can't get edismax to work on one of those cores, and it's configured 
very similar to the demo, which does work. I have different fields, but 
overall I'm not doing much different. I'm testing using a query with 
OR in it to try to get a union. On two of the cores, I get the union, 
on my third one I get a much smaller set than either term should return. 
If I tell the misbehaving core to have a defType of lucene, that does 
honor the OR.


What could I be possibly missing?

Thanks,
Richard


solrj library requirements: slf4j-jdk14-1.5.5.jar

2012-06-06 Thread Welty, Richard
the section of the solrj wiki page on setting up the class path calls for
slf4j-jdk14-1.5.5.jar which is supposed to be in a lib/ subdirectory.

i don't see this jar or any like it with a different version anywhere
in either the 3.5.0 or 3.6.0 distributions.

is it really needed or is this just slightly outdated documentation? the top of 
the page (which references solr 1.4) suggests this is true, and i see other 
docs on the web suggesting this is the case, but the first result that pops out 
of google for solrj is the apparently outdated wiki page, so i imagine others 
will encounter the same issue.

the other, more recent pages are not without issue as well, for example this 
page:

http://lucidworks.lucidimagination.com/display/solr/Using+SolrJ

references apache-solr-common which i'm not finding either. 

thanks,
   richard


Re: London OSS search social - meetup 6th June

2012-06-05 Thread Richard Marr
Quick reminder, we're meeting at The Plough in Bloomsbury tomorrow night. 
Details and RSVP on the meetup page:

http://www.meetup.com/london-search-social/events/65873032/

--
Richard Marr

On 3 Jun 2012, at 00:29, Richard Marr richard.m...@gmail.com wrote:

 
 Apologies for the short notice guys, we're meeting up at The Plough in 
 Bloomsbury on Wednesday 6th June.
 
 As usual the format is open and there's a healthy mix of experience and 
 backgrounds. Please come and share wisdom, ask questions, geek out, etc. in 
 the presence of beverages.
 
 -- 
 Richard Marr


London OSS search social - meetup 6th June

2012-06-02 Thread Richard Marr
Apologies for the short notice guys, we're meeting up at The Plough in
Bloomsbury on Wednesday 6th June.

As usual the format is open and there's a healthy mix of experience and
backgrounds. Please come and share wisdom, ask questions, geek out, etc. in
the presence of beverages.

-- 
Richard Marr


indexing documents from a git repository

2012-05-25 Thread Welty, Richard
i have a need to incrementally index documents (probably MS 
Office/OpenOffice/pdf files)
from a GIT repository using Tika. i'm expecting to run periodic pulls against 
the repository
to find new and updated docs.

does anyone have any experience and/or thoughts/suggestions that they'd like to 
share?

thanks,
  richard


using Tika (ExtractingRequestHandler)

2012-05-17 Thread Welty, Richard
i'm looking at using Tika to index a bunch of documents. the wiki page seems to 
be a little bit out of date (// TODO: this is out of date as of Solr 1.4 - 
dist/apache-solr-cell-1.4.jar and all of contrib/extraction/lib are needed) 
and it also looks a little incomplete.

is there an actual list of all the required jar files? i'm not sure they are in 
the same place in the 3.6.0 distribution as they were in 1.4, and having an 
actual list would be very helpful in figuring out where they are.

as for Sending Documents to Solr, is there any plan to address this todo: // 
TODO: describe the different ways to send the documents to solr (POST body, 
form encoded, remoteStreaming). this is really just a nice to have, i can see 
how to accomplish my goals using a method that is currently documented.

thanks,
   richard


RE: SOLR Security

2012-05-11 Thread Welty, Richard
in fact, there's a sample proxy.php on the ajax-solr web page which can easily 
be modified into a security layer. my solr servers only listen to requests 
issued by a narrow list of systems, and everything gets routed through a 
modified copy of the proxy.php file, which checks whether the user is logged 
in, and adds terms to the query to limit returned results to those the user is 
permitted to see.


-Original Message-
From: Jan Høydahl [mailto:j...@hoydahl.no]
Sent: Fri 5/11/2012 9:45 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR Security
 
Hi,

There is nothing stopping you from pointing Ajax-SOLR to a URL on your 
app-server, which acts as a security insulation layer between the Solr backend 
and the world. In this (thin) layer you can analyze the input and choose 
carefully what to let through and not.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 11. mai 2012, at 06:37, Anupam Bhattacharya wrote:

 Yes, I agree with you.
 
 But Ajax-SOLR Framework doesn't fit in that manner. Any alternative
 solution ?
 
 Anupam
 
 On Fri, May 11, 2012 at 9:41 AM, Klostermeyer, Michael 
 mklosterme...@riskexchange.com wrote:
 
 Instead of hitting the Solr server directly from the client, I think I
 would go through your application server, which would have access to all
 the users data and can forward that to the Solr server, thereby hiding it
 from the client.
 
 Mike
 
 
 -Original Message-
 From: Anupam Bhattacharya [mailto:anupam...@gmail.com]
 Sent: Thursday, May 10, 2012 9:53 PM
 To: solr-user@lucene.apache.org
 Subject: SOLR Security
 
 I am using Ajax-Solr Framework for creating a search interface. The search
 interface works well.
 In my case, the results have document level security so by even indexing
 records with there authorized users help me to filter results per user
 based on the authentication of the user.
 
 The problem that I have to a pass always a parameter to the SOLR Server
 with userid={xyz} which one can figure out from the SOLR URL(ajax call url)
 using Firebug tool in the Net Console on Firefox and can change this
 parameter value to see others records which he/she is not authorized.
 Basically it is Cross Site Scripting Issue.
 
 I have read about some approaches for Solr Security like Nginx with Jetty
  .htaccess based security.Overall what i understand from this is that we
 can restrict users to do update/delete operations on SOLR as well as we can
 restrict the SOLR admin interface to certain IPs also. But How can I
 restrict the {solr-server}/solr/select based results from access by
 different user id's ?
 





RE: Searching by location - What do I send to Solr?

2012-05-03 Thread Welty, Richard
this is called geocoding and is properly a subject for GIS types.
it can be non trivial and the data you need to set it up may not be cheap.
i can't address the UK application, but i am somewhat familiar with the US
problem space, and in the US 5 digit postal (zip) codes don't map to
discreet locations, they map to bundles of postal delivery routes.

you need, i think, to research how UK postal codes actually work and what
data sources are available so that you can frame your problem appropriately.

richard

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thu 5/3/2012 3:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Searching by location - What do I send to Solr?
 
Hi,

This is quite a challenge. I know there are situations when you can get by
with google maps api or similar, but they limit the number of requests and I
need more than that, unfortunatly for the full service they charge a
fortune!

So, going back to my question, does anyone have any ideas or suggestions of
a good solution?

Search for London-*Convert London to Long/Lat*-Send Query to
Solr-Return Query

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296p3960231.html
Sent from the Solr - User mailing list archive at Nabble.com.




response codes from http update requests

2012-05-01 Thread Welty, Richard
should i be concerned with the http response codes from update requests?

i can't find documentation on what values come back from them anywhere
(although maybe i'm not looking hard enough.) are they just http standard
with 200 for success and 400/500 for failures?

thanks,
   richard


pushing updates to solr from postgresql

2012-04-18 Thread Welty, Richard
i have a setup right this instant where the dataimporthandler is being used to 
pull data for an index from a postgresql server.

i'd like to switch over to push, and am looking for some validation of my 
approach.

i have perl installed as an untrusted language on my postgresql server and am 
planning to set up triggers on the tables where insert/update/delete operations 
should cause an update of the relevant solr indexes. the trigger functions will 
build xml in the format for UpdateXmlMessages and notify Solr via http requests.


is this sensible, or am i missing something easier?

also, does anyone have any thoughts about coordinating initial indexing/full 
reindexing via dataimporthandler with the trigger based push operations?

thanks,
   richard


RE: Solr Tomcat Install

2012-03-28 Thread Welty, Richard
1) you need apache velocity in the class path for tomcat

2) here's a way of dealing with these that may go quicker than asking on the 
mailing list everytime they come up -- clip out the pertinent part of the stack 
trace (in this case
java.lang.NoClassDefFoundError: org/apache/velocity/context/Context
and do a google search on it. then look at the first couple of forum or blog 
posts that come up. 95+% of the time you'll find the answer very quickly, as 
these sorts of errors are rarely particularly new.

richard


-Original Message-
From: rdancy [mailto:rda...@wiley.com]
Sent: Wed 3/28/2012 1:20 PM
To: solr-user@lucene.apache.org
Subject: Solr Tomcat Install
 
Hello, I have configured Solr inside Tomcat and I get the following error
when I go to browser and click on the solr admin link:

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml
-
java.lang.NoClassDefFoundError: org/apache/velocity/context/Context at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:247) at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:383)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:447) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1556) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1550) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1583) at
org.apache.solr.core.SolrCore.initWriters(SolrCore.java:1466) at
org.apache.solr.core.SolrCore.init(SolrCore.java:556) at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:463) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3696)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4343)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at
org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
org.apache.catalina.core.StandardService.start(StandardService.java:516) at
org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
org.apache.catalina.startup.Catalina.start(Catalina.java:566) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Caused by:
java.lang.ClassNotFoundException: org.apache.velocity.context.Context at
java.net.URLClassLoader$1.run(URLClassLoader.java:202) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:190) at
java.lang.ClassLoader.loadClass(ClassLoader.java:307) at
java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627) at
java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 42 more

type Status report

message Severe errors in solr configuration. Check your log files for more
detailed information on what may be wrong. If you want solr to continue
after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in solr.xml

Re: Tags and Folksonomies

2012-03-24 Thread Richard Noble
Hi

I have not actually done this yet, but will need to do something similar.
We will also be using user tagging, and ratings to influence relevancy for
the searches.

I take it that you want something like if a document has been tagged 8
times with the tag tagvalue
but only 4 times with the tag othervalue then you want to boost rate the
tag tagvalue higher?

The route I plan to go down would be to store the tag value count against
the document, and
use a (possibly custom) function to boost accordingly.

Just a theory at this point, and I am sure that there may be better ways.

Hope it helps

Richard


On Fri, Mar 23, 2012 at 5:44 PM, Nishant Chandra
nishant.chan...@gmail.comwrote:

 Suppose I have content which has title and description. Users can tag
 content
 and search content based on tag, title and description. Tag has more
 weightage.

 Any inputs on how indexing and retrieval will work given there is content
 and tags using Solr? Has anyone implemented search based on collaborative
 tagging?

 Thanks,
 Nishant




-- 
*nix has users, Mac has fans, Windows has victims.


RE: org.apache.solr.common.SolrException: Internal Server Error

2012-03-21 Thread Welty, Richard
yes. when i have seen these, generally the full trace is good about including 
the exception that triggered the whole thing, you just need to look down the 
trace aways to find it.

richard


-Original Message-
From: vybe3142 [mailto:vybe3...@gmail.com]
Sent: Wed 3/21/2012 4:58 PM
To: solr-user@lucene.apache.org
Subject: Re: org.apache.solr.common.SolrException: Internal Server Error
 
Try to obtain the server trace, That should tell you what specifically the
error is

--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Internal-Server-Error-tp3842862p3846821.html
Sent from the Solr - User mailing list archive at Nabble.com.







Re: Apache solr issue after configuration

2012-03-16 Thread Richard Noble
Solr newbie here, but this looks familier.

Another thing to make sure of is that the plugin jars are not ialready
loaded from the standard java classpath.
I had a problem with this in that some jars were being loaded by the
standard java classloader,
and my some other plugins were being loaded by Solr,
so QueryResponseWriter was not an instance of
VelocityResponseWriter due to the classloader differences.

They should be loaded by Solr's classloader.

Regards

Richard

On Fri, Mar 16, 2012 at 1:24 PM, Erick Erickson erickerick...@gmail.comwrote:

 At a guess, you don't have any paths to solr dist. Try copying all the
 other lib
 directives from the example (not core) dir (adjusting paths as necessary).
 The
 error message indicates you aren't getting to
 /dist/apache-solr-velocity-3.5.0.jar

 Best
 Erick

 On Thu, Mar 15, 2012 at 9:48 AM, ViruS svi...@gmail.com wrote:
  Hello,
 
  I have still same problem after installation.
  Files are loaded:
 
  ~/appl/apache-solr-3.5.0/example $ java -Dsolr.solr.home=multicore/ -jar
  start.jar 21 | grep contrib
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-2.0.jar'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-beanutils-NOTICE.txt'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-NOTICE.txt'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-collections-NOTICE.txt'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-1.6.4.jar'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-NOTICE.txt'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-LICENSE-ASL.txt'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-collections-3.2.1.jar'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-beanutils-1.7.0.jar'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-beanutils-LICENSE-ASL.txt'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-LICENSE-ASL.txt'
  to classloader
  INFO: Adding
 
 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/commons-collections-LICENSE-ASL.txt'
  to classloader
 
 
  my config multicore/ac/conf/solrconfig.xml
 
  config
   luceneMatchVersionLUCENE_35/luceneMatchVersion
   lib dir=../../../contrib/velocity/lib /
  ...
  queryResponseWriter name=velocity class=solr.VelocityResponseWriter
  enable=true/
  /config
 
  And I still get error:
 
  INFO: [ac] Opening new SolrCore at multicore/ac/,
  dataDir=multicore/ac/data/
  2012-03-15 13:18:11 org.apache.solr.core.JmxMonitoredMap init
  INFO: No JMX servers found, not exposing Solr information with JMX.
  2012-03-15 13:18:11 org.apache.solr.common.SolrException log
  SEVERE: org.apache.solr.common.SolrException: Error Instantiating
  QueryResponseWriter, solr.VelocityResponseWriter is not a
  org.apache.solr.response.QueryResponseWriter
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:427)
  ...
 
  What's wrong?
 
  Thanks in advanced for help!
 
 
 
  --
  Piotr (ViruS) Sikora
  vi...@cdt.pl
  svi...@gmail.com
  JID: vi...@ipc.net.pl




-- 
*nix has users, Mac has fans, Windows has victims.


disabling QueryElevationComponent

2012-03-05 Thread Welty, Richard
i googled and found numerous references to this, but no answers that went to my 
specific issues.

i have a solr 3.5.0 server set up that needs to index several different 
document types, there is no common unique key field. so i can't use the 
uniqueKey declaration and need to disable the QueryElevationComponent. 

when i set this up with the uniqueKey, i mapped ids from the various database 
tables to the id key temporarily just to get things working, but the results 
from queries are showing me i really have to get rid of that hack. so i 
commented out uniqueKey in schema.xml and commented out the 
QueryElevationComponent searchComponent and the associated requestHandler in 
solrconfig.xml

when i restart solr and go to the solr/admin/dataimport.jsp page to test, i get 
the 'missing core name in path' error.

so what further configuration changes are required to disable 
QueryElevationComponent? 

thanks,
   richard


RE: disabling QueryElevationComponent

2012-03-05 Thread Welty, Richard



Walter Underwood [mailto:wun...@wunderwood.org] writes:

You may be able to have unique keys. At Netflix, I found that there were 
collisions between the movie IDs and the person IDs. So, I put an 'm' at the 
beginning of each movie ID and a 'p' at the beginning of each person ID. Like 
magic, I had unique IDs.

did you do this with a transformer at index time, or in some other manner?

You should be able to disable the query elevation stuff by removing it from 
your solrconfig.xml.

the documentation certainly implies this, which is why i'm baffled. i see no 
reason
why removing the config should trigger the multiple core error when i only have 
the
default setup with one core.

richard



RE: disabling QueryElevationComponent

2012-03-05 Thread Welty, Richard
Walter Underwood [mailto:wun...@wunderwood.org] writes:
 
On Mar 5, 2012, at 1:16 PM, Welty, Richard wrote:

 Walter Underwood [mailto:wun...@wunderwood.org] writes:
 
 You may be able to have unique keys. At Netflix, I found that there were
 collisions between the movie IDs and the person IDs. So, I put an 'm' at
 the beginning of each movie ID and a 'p' at the beginning of each person
 ID. Like magic, I had unique IDs.

 did you do this with a transformer at index time, or in some other manner?

SQL should be able to do this, though it might not be portable. For MySQL is
 it something like:

select
   concat('m', movie_id) as id,

ok, thanks. i know how to do that in postgresql...

richard








London Open Source Search Social - Tuesday 18th October

2011-09-12 Thread Richard Marr
Hi all,

That's right, hold on to your hats, we're holding another London Search
Social on the 18th Oct.
http://www.meetup.com/london-search-social/events/33218292/

Venue is still TBD, but highly likely to be a quiet(ish) central London pub.

There's usually a healthy mix of experience and backgrounds, and pet topics
or show-n-tell projects are welcome.

Save the date, it'd be great to see you there.


-- 
Richard Marr


RE: Building a facet query in SolrJ

2011-08-11 Thread Simon, Richard T
Thanks! I actually found a page on line that explained this.

-Rich

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, August 10, 2011 4:01 PM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: RE: Building a facet query in SolrJ


: query.addFacetQuery(MyField + : + \ + uri + \);
...
: But when I examine queryResponse.getFacetFields, it's an empty list, if 

facet.query constraints+counts do not come back in the facet.field 
section of hte response.  they come back in the facet.query section of 
the response (look at the XML in your browser and you'll see what i 
mean)...

https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/response/QueryResponse.html#getFacetQuery%28%29


-Hoss


Building a facet query in SolrJ

2011-08-10 Thread Simon, Richard T
Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the 
results I expect. I have a field, MyField, and I want to get facets for 
specific values of that field. That is, I want a FacetField if MyField is 
ABC, DEF, etc. (a specific list of values), but not if MyField is any other 
value.

If I build my query like this:

SolrQuery query = new SolrQuery( luceneQueryStr );
  query.setStart( request.getStartIndex() );
  query.setRows( request.getMaxResults() );
  query.setFacet(true);
 query.setFacetMinCount(1);

  query.addFacetField(MYFIELD);

  for (String fieldValue : desiredFieldValues) {
   query.addFacetQuery(MYFIELD + : + fieldValue);
 }


queryResponse.getFacetFields returns facets for ALL values of MyField. I 
figured that was because setting the facet field with addFacetField caused Solr 
to examine all values. But, if I take out that line, then getFacetFields 
returns an empty list.

I'm sure I'm doing something simple wrong, but I'm out of ideas right now.

-Rich






RE: Building a facet query in SolrJ

2011-08-10 Thread Simon, Richard T
Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew 
there was something simple wrong.

From: Simon, Richard T
Sent: Wednesday, August 10, 2011 10:55 AM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: Building a facet query in SolrJ

Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the 
results I expect. I have a field, MyField, and I want to get facets for 
specific values of that field. That is, I want a FacetField if MyField is 
ABC, DEF, etc. (a specific list of values), but not if MyField is any other 
value.

If I build my query like this:

SolrQuery query = new SolrQuery( luceneQueryStr );
  query.setStart( request.getStartIndex() );
  query.setRows( request.getMaxResults() );
  query.setFacet(true);
 query.setFacetMinCount(1);

  query.addFacetField(MYFIELD);

  for (String fieldValue : desiredFieldValues) {
   query.addFacetQuery(MYFIELD + : + fieldValue);
 }


queryResponse.getFacetFields returns facets for ALL values of MyField. I 
figured that was because setting the facet field with addFacetField caused Solr 
to examine all values. But, if I take out that line, then getFacetFields 
returns an empty list.

I'm sure I'm doing something simple wrong, but I'm out of ideas right now.

-Rich






RE: Building a facet query in SolrJ

2011-08-10 Thread Simon, Richard T
I take it back. I didn't find it. I corrected my values and the facet queries 
still don't find what I want.

The values I'm looking for are URIs, so they look like: http://place.org/abc/def

I add the facet query like so:

query.addFacetQuery(MyField + : + \ + uri + \);


I print the query, just to see what it is:

Facet Query:  MyField: : http://place.org/abc/def;

But when I examine queryResponse.getFacetFields, it's an empty list, if I do 
not set the facet field. If I set the facet field to MyField, then I get facets 
for ALL the values of MyField, not just the ones in the facet queries.

Can anyone help here?

Thanks.


From: Simon, Richard T
Sent: Wednesday, August 10, 2011 11:07 AM
To: Simon, Richard T; solr-user@lucene.apache.org
Subject: RE: Building a facet query in SolrJ

Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew 
there was something simple wrong.

From: Simon, Richard T
Sent: Wednesday, August 10, 2011 10:55 AM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: Building a facet query in SolrJ

Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the 
results I expect. I have a field, MyField, and I want to get facets for 
specific values of that field. That is, I want a FacetField if MyField is 
ABC, DEF, etc. (a specific list of values), but not if MyField is any other 
value.

If I build my query like this:

SolrQuery query = new SolrQuery( luceneQueryStr );
  query.setStart( request.getStartIndex() );
  query.setRows( request.getMaxResults() );
  query.setFacet(true);
 query.setFacetMinCount(1);

  query.addFacetField(MYFIELD);

  for (String fieldValue : desiredFieldValues) {
   query.addFacetQuery(MYFIELD + : + fieldValue);
 }


queryResponse.getFacetFields returns facets for ALL values of MyField. I 
figured that was because setting the facet field with addFacetField caused Solr 
to examine all values. But, if I take out that line, then getFacetFields 
returns an empty list.

I'm sure I'm doing something simple wrong, but I'm out of ideas right now.

-Rich






RE: Building a facet query in SolrJ

2011-08-10 Thread Simon, Richard T
Hi -- I do get facets for all the values of MyField when I specify the facet 
field, but that's not what I want. I just want facets for a subset of the 
values of MyField. That's why I'm trying to use the facet queries, to just get 
facets for those values.


-Rich

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Wednesday, August 10, 2011 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Building a facet query in SolrJ

Try making your queries, manually, to see this closer in action... 
q=MyField:uri and see what you get.  In this case, because your URI contains 
characters that make the default query parser unhappy, do this sort of query 
instead:

{!term f=MyField}uri

That way the query is parsed properly into a single term query.

I am a little confused below since you're faceting on MyField entirely 
(addFacetField) where you'd get the values of each URI facet query in that list 
anyway.

Erik

On Aug 10, 2011, at 13:42 , Simon, Richard T wrote:

 I take it back. I didn't find it. I corrected my values and the facet queries 
 still don't find what I want.
 
 The values I'm looking for are URIs, so they look like: 
 http://place.org/abc/def
 
 I add the facet query like so:
 
 query.addFacetQuery(MyField + : + \ + uri + \);
 
 
 I print the query, just to see what it is:
 
 Facet Query:  MyField: : http://place.org/abc/def;
 
 But when I examine queryResponse.getFacetFields, it's an empty list, if I do 
 not set the facet field. If I set the facet field to MyField, then I get 
 facets for ALL the values of MyField, not just the ones in the facet queries.
 
 Can anyone help here?
 
 Thanks.
 
 
 From: Simon, Richard T
 Sent: Wednesday, August 10, 2011 11:07 AM
 To: Simon, Richard T; solr-user@lucene.apache.org
 Subject: RE: Building a facet query in SolrJ
 
 Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew 
 there was something simple wrong.
 
 From: Simon, Richard T
 Sent: Wednesday, August 10, 2011 10:55 AM
 To: solr-user@lucene.apache.org
 Cc: Simon, Richard T
 Subject: Building a facet query in SolrJ
 
 Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the 
 results I expect. I have a field, MyField, and I want to get facets for 
 specific values of that field. That is, I want a FacetField if MyField is 
 ABC, DEF, etc. (a specific list of values), but not if MyField is any 
 other value.
 
 If I build my query like this:
 
 SolrQuery query = new SolrQuery( luceneQueryStr );
  query.setStart( request.getStartIndex() );
  query.setRows( request.getMaxResults() );
  query.setFacet(true);
 query.setFacetMinCount(1);
 
  query.addFacetField(MYFIELD);
 
  for (String fieldValue : desiredFieldValues) {
   query.addFacetQuery(MYFIELD + : + fieldValue);
 }
 
 
 queryResponse.getFacetFields returns facets for ALL values of MyField. I 
 figured that was because setting the facet field with addFacetField caused 
 Solr to examine all values. But, if I take out that line, then getFacetFields 
 returns an empty list.
 
 I'm sure I'm doing something simple wrong, but I'm out of ideas right now.
 
 -Rich
 
 
 
 



Highlighting map use unique key field?

2011-06-20 Thread Simon, Richard T

Hi - A simple yes or no question, I think.

I want to retrieve highlighting result from a QueryResponse. I know to use the 
following:

MapString, MapString, ListString highlighting = resp.getHighlighting();


Most of the examples I've seen use the document uid  to extract the results 
like so:

String key = resultDec.getFieldValue(UID_FIELD);
MapString, ListString map = highlighting.get(key);


I think this is the right way to go, however I did see one code example that 
did things a bit differently: They defined a field to query and then used that 
field as the id field, like so:

solrQuery.setParam(fl,query);

... peform query ...


String id = (String) resultDoc.getFieldValue(query);

MapString,ListString highlightSnippets = 
queryResponse.getHighlighting().get(id);


Our documents have no unique field right now. I can create one rather easily. 
However, because of the above example,  I've been asked to confirm that the map 
returned by highlighting requires/uses the unique key field defined in the 
schema.

So, yes or no: Does the highlighting  map require a unique key field? (yes 
could mean well there are obscure ways to avoid it, but using the unique key 
is easier/better/more common).

Thanks,

-Rich



RE: getFieldValue always returns an ArrayList?

2011-06-16 Thread Simon, Richard T
Interesting. You guessed right. I changed multivalued to multiValued and 
all of a sudden I get Strings. But, doesn't multivalued default to false? In my 
schema, I originally did not set multivalued. I only put in multivalued=false 
after I experienced this issue. 

-Rich

For the record, I had a number of fields which had never settings for 
multivalued because none of them were multivalued and I expected the default to 
be false. When I experienced this problem, I added multivalued=false to all 
of them. I still had the problem. So, I added a method to deal with the 
returned ArrayLists:

private Object getFieldValue(String field, SolrDocument document) {

ArrayList list = 
(ArrayList)document.getFieldValue(field);
return list.get(0);

}


I deliberately did not test if the return Object was an ArrayList because I 
wanted to get an exception if any of them were Strings; I got no exceptions, so 
they were all returned as ArrayLists. 

I then changed one of the fields to use multiValued=false, and I got an 
exception, trying to cast String to ArrayList! So, I changed all the 
troublesome fields to use multiValued, and changed my helper method to look 
like this:

private Object getFieldValue(String field, SolrDocument document) {
Object o = document.getFieldValue(field);

if (o instanceof ArrayList) {
System.out.println(### Field  + field +  is an 
instance of ArrayList.);
ArrayList list = 
(ArrayList)document.getFieldValue(field);
return list.get(0);
} else {
if (!(o instanceof String)) {
System.out.println(## ERROR);
} else {
System.out.println(### Field  + field +  
is an instance of String.);
}
return o;
}

}


Here's the output, interspersed with the schema definitions of the fields:

field name=uri type=string indexed=true stored=true 
multiValued=false required=true /
### Field uri is an instance of String.

field name=entity_label type=string indexed=false stored=true 
required=false /
### Field entity_label is an instance of ArrayList.

field name=institution_uri type=string indexed=true stored=true 
required=false /
### Field institution_uri is an instance of ArrayList.

field name=asserted_type_uri type=string indexed=true stored=true 
required=false /
### Field asserted_type_uri is an instance of ArrayList.

field name=asserted_type_label type=text_eaglei indexed=true 
stored=true required=false /
### Field asserted_type_label is an instance of ArrayList.

 field name=provider_uri type=string indexed=true stored=true 
multiValued=false required=false /
### Field provider_uri is an instance of String.

field name=provider_label type=string indexed=true stored=true 
multiValued=false required=false /
### Field provider_label is an instance of String.


As you can see, the ones with no declaration for multivalued are returned as 
ArrayLists, while the ones with multiValued=false are returned as Strings. 

So, it looks like there are two problems here: multivalued (small v) is not 
recognized, since using that in the schema still causes all fields to be 
returned as ArrayLists; and, multivalued does not default to false (or, at 
least, not setting it causes a field to be returned as an ArrayList, as though 
it were set to true).

-Rich


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, June 15, 2011 4:25 PM
To: solr-user@lucene.apache.org
Subject: Re: getFieldValue always returns an ArrayList?

Hmmm, I admit I'm not using embedded, and I'm using 3.2, but I'm
not seeing the behavior you are.

My question about reindexing could have been better stated, I
was just making sure you didn't have some leftover cruft where
your field was multi-valued from previous experiments, but if
you're reindexing each time that's not the problem.

Arrrh, camel case may be striking again. Try multiValued, not
multivalued

If that's still not it, can we see the code?

Best
Erick

On Wed, Jun 15, 2011 at 3:47 PM, Simon, Richard T
richard_si...@hms.harvard.edu wrote:
 We rebuild the index from scratch each time we start (for now). The fields in 
 question are not multi-valued; in fact, I explicitly set multi-valued to 
 false, just to be sure.

 Yes, this is SolrJ, using the embedded server, if that matters.

 Using Solr/Lucene 3.1.0.

 -Rich

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, June 15, 2011 3:44 PM
 To: solr-user@lucene.apache.org
 Subject: Re: getFieldValue always returns an ArrayList?

 Did you perhaps change the schema but not re-index? I'm

RE: getFieldValue always returns an ArrayList?

2011-06-16 Thread Simon, Richard T
FYI: Using multiValued=false for all string fields results in the following 
output:

### Field uri is an instance of String.
### Field entity_label is an instance of String.
### Field institution_uri is an instance of String.
### Field asserted_type_uri is an instance of String.
### Field asserted_type_label is an instance of String.
### Field provider_uri is an instance of String.
### Field provider_label is an instance of String.

-Rich

-Original Message-
From: Simon, Richard T 
Sent: Thursday, June 16, 2011 10:08 AM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: RE: getFieldValue always returns an ArrayList?

Interesting. You guessed right. I changed multivalued to multiValued and 
all of a sudden I get Strings. But, doesn't multivalued default to false? In my 
schema, I originally did not set multivalued. I only put in multivalued=false 
after I experienced this issue. 

-Rich

For the record, I had a number of fields which had never settings for 
multivalued because none of them were multivalued and I expected the default to 
be false. When I experienced this problem, I added multivalued=false to all 
of them. I still had the problem. So, I added a method to deal with the 
returned ArrayLists:

private Object getFieldValue(String field, SolrDocument document) {

ArrayList list = 
(ArrayList)document.getFieldValue(field);
return list.get(0);

}


I deliberately did not test if the return Object was an ArrayList because I 
wanted to get an exception if any of them were Strings; I got no exceptions, so 
they were all returned as ArrayLists. 

I then changed one of the fields to use multiValued=false, and I got an 
exception, trying to cast String to ArrayList! So, I changed all the 
troublesome fields to use multiValued, and changed my helper method to look 
like this:

private Object getFieldValue(String field, SolrDocument document) {
Object o = document.getFieldValue(field);

if (o instanceof ArrayList) {
System.out.println(### Field  + field +  is an 
instance of ArrayList.);
ArrayList list = 
(ArrayList)document.getFieldValue(field);
return list.get(0);
} else {
if (!(o instanceof String)) {
System.out.println(## ERROR);
} else {
System.out.println(### Field  + field +  
is an instance of String.);
}
return o;
}

}


Here's the output, interspersed with the schema definitions of the fields:

field name=uri type=string indexed=true stored=true 
multiValued=false required=true /
### Field uri is an instance of String.

field name=entity_label type=string indexed=false stored=true 
required=false /
### Field entity_label is an instance of ArrayList.

field name=institution_uri type=string indexed=true stored=true 
required=false /
### Field institution_uri is an instance of ArrayList.

field name=asserted_type_uri type=string indexed=true stored=true 
required=false /
### Field asserted_type_uri is an instance of ArrayList.

field name=asserted_type_label type=text_eaglei indexed=true 
stored=true required=false /
### Field asserted_type_label is an instance of ArrayList.

 field name=provider_uri type=string indexed=true stored=true 
multiValued=false required=false /
### Field provider_uri is an instance of String.

field name=provider_label type=string indexed=true stored=true 
multiValued=false required=false /
### Field provider_label is an instance of String.


As you can see, the ones with no declaration for multivalued are returned as 
ArrayLists, while the ones with multiValued=false are returned as Strings. 

So, it looks like there are two problems here: multivalued (small v) is not 
recognized, since using that in the schema still causes all fields to be 
returned as ArrayLists; and, multivalued does not default to false (or, at 
least, not setting it causes a field to be returned as an ArrayList, as though 
it were set to true).

-Rich


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, June 15, 2011 4:25 PM
To: solr-user@lucene.apache.org
Subject: Re: getFieldValue always returns an ArrayList?

Hmmm, I admit I'm not using embedded, and I'm using 3.2, but I'm
not seeing the behavior you are.

My question about reindexing could have been better stated, I
was just making sure you didn't have some leftover cruft where
your field was multi-valued from previous experiments, but if
you're reindexing each time that's not the problem.

Arrrh, camel case may be striking again. Try multiValued, not
multivalued

If that's still not it, can

RE: getFieldValue always returns an ArrayList?

2011-06-16 Thread Simon, Richard T
We haven't changed Solr versions. We've been using 3.1.0 all along.

Plus, I have some code that runs during indexing and retrieves the fields from 
a SolrInputDocument, rather than a SolrDocument. That code gets Strings without 
any problem, and always has, even without saying multiValued=false.

-Rich

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, June 16, 2011 2:18 PM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: RE: getFieldValue always returns an ArrayList?


: and all of a sudden I get Strings. But, doesn't multivalued default to 
: false? In my schema, I originally did not set multivalued. I only put in 
: multivalued=false after I experienced this issue.

That's dependent on the version of Solr, and it's is where the 
version property of the schema comes in.  (as the default behavior in 
solr changes, it does so dependent on what version you specify in your 
schema to prevent radical behavior changes if you upgrade but keep the 
same configs)...

schema name=example version=1.4
  !-- attribute name is the name of this schema and is only used for display 
purposes.
   Applications should change this to reflect the nature of the search 
collection.
   version=1.4 is Solr's version number for the schema syntax and 
semantics.  It should
   not normally be changed by applications.
   1.0: multiValued attribute did not exist, all fields are multiValued by 
nature
   1.1: multiValued attribute introduced, false by default 
   1.2: omitTermFreqAndPositions attribute introduced, true by default 
except for text fields.
   1.3: removed optional field compress feature
   1.4: default auto-phrase (QueryParser feature) to off
 --



-Hoss


RE: getFieldValue always returns an ArrayList?

2011-06-16 Thread Simon, Richard T
Ah! That was the problem. The version was 1.0. I'll change it to 1.2. Thanks!

-Rich

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, June 16, 2011 2:33 PM
To: Simon, Richard T
Cc: solr-user@lucene.apache.org
Subject: RE: getFieldValue always returns an ArrayList?


: We haven't changed Solr versions. We've been using 3.1.0 all along.

but that's not what i'm talking about.  I'm talking about the schema 
version ... a specific property declared in your schema.xml file.

did you check it?

(even when people start with Solr X, they sometimes are using schema.xml 
files provided by external packages -- Drupal, wordpress, etc... -- and 
don't notice that those are from older versions)

: Plus, I have some code that runs during indexing and retrieves the 
: fields from a SolrInputDocument, rather than a SolrDocument. That code 
: gets Strings without any problem, and always has, even without saying 
: multiValued=false.

SolrInputDocument's are irelevant.  they are used to index data, but they 
don't know anything about the schema.  A SolrInputDocument might be 
completely invalid because of multiple values for singled value fields, or 
missing values for required fields, etc...   what comes back from a search 
*is* consistent with the schema (even when there is only one value stored 
in a multiValued field)

-Hoss


getFieldValue always returns an ArrayList?

2011-06-15 Thread Simon, Richard T
Hi - I am examining a SolrDocument I retrieved through a query. The field I am 
looking at is declared this way in my schema:

field name=uri type=string indexed=true stored=true 
multivalued=false required=true /

I know multivalued defaults to false, but I set it explicitly because I'm 
seeing some unexpected behavior. I retrieve the value of the field like so:

final String resource = (String)document.getFieldValue(uri);


However, I get an exception because an ArrayList is returned. I confirmed that 
the returned ArrayList has one element with the correct value, but I thought 
getFieldValue would return a String if the field is single valued. When I index 
the document, I have some code that retrieves the same field in the same way 
from the SolrInputDocument, and that code works.

I looked at the code for SolrDocument.setField and it looks like the only way a 
field should be set to an ArrayList is if one is passed in by the code creating 
the SolrDocument. Why would it do that if the field is not multivalued?

Is this behavior expected?

-Rich


RE: getFieldValue always returns an ArrayList?

2011-06-15 Thread Simon, Richard T
We rebuild the index from scratch each time we start (for now). The fields in 
question are not multi-valued; in fact, I explicitly set multi-valued to false, 
just to be sure.

Yes, this is SolrJ, using the embedded server, if that matters.

Using Solr/Lucene 3.1.0.

-Rich

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, June 15, 2011 3:44 PM
To: solr-user@lucene.apache.org
Subject: Re: getFieldValue always returns an ArrayList?

Did you perhaps change the schema but not re-index? I'm grasping
at straws here, but something like this might happen if part of
your index has that field as a multi-valued field

If that't not the problem, what version of solr are you using? I
presume this is SolrJ?

Best
Erick

On Wed, Jun 15, 2011 at 2:21 PM, Simon, Richard T
richard_si...@hms.harvard.edu wrote:
 Hi - I am examining a SolrDocument I retrieved through a query. The field I 
 am looking at is declared this way in my schema:

 field name=uri type=string indexed=true stored=true 
 multivalued=false required=true /

 I know multivalued defaults to false, but I set it explicitly because I'm 
 seeing some unexpected behavior. I retrieve the value of the field like so:

 final String resource = (String)document.getFieldValue(uri);


 However, I get an exception because an ArrayList is returned. I confirmed 
 that the returned ArrayList has one element with the correct value, but I 
 thought getFieldValue would return a String if the field is single valued. 
 When I index the document, I have some code that retrieves the same field in 
 the same way from the SolrInputDocument, and that code works.

 I looked at the code for SolrDocument.setField and it looks like the only way 
 a field should be set to an ArrayList is if one is passed in by the code 
 creating the SolrDocument. Why would it do that if the field is not 
 multivalued?

 Is this behavior expected?

 -Rich



Re: London open source search social - 13th June

2011-06-09 Thread Richard Marr
Just a quick reminder that we're meeting on Monday. Come along if you're
around.


On 1 June 2011 13:27, Richard Marr richard.m...@gmail.com wrote:

 Hi guys,

 Just to let you know we're meeting up to talk all-things-search on Monday
 13th June. There's usually a good mix of backgrounds and experience levels
 so if you're free and in the London area then it'd be good to see you there.

 Details:
 7pm - The Elgin - 96 Ladbrooke Grove
 http://www.meetup.com/london-search-social/events/20387881/

 

 Greetings search geeks!

 We've booked the next meetup for the 13th June. As usual, the plan is to
 meet up and geek out over a friendly beer.

 I know my co-organiser René has been working on some interesting search
 projects, and I've recently left Empora to work on my own project so by June
 I should hopefully have some war stories about using @elasticsearch in
 production. The format is completely open though so please bring your own
 topics if you've got them.

 Hope to see you there!

 --
 Richard Marr


Re: Sorting algorithm

2011-06-03 Thread Richard Hodsdon
Hi Tomás

Thanks, that makes a lot of sense, and your math is sound.

It is working well. An if() function would be great, and it seems its coming
soon.

Richard

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3019077.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sorting algorithm

2011-06-02 Thread Richard Hodsdon
Hi,

I want to do a similar sorting function query to the way reddit handles its
ranking.
I have the date stored in a 
fieldType name=tdate class=solr.TrieDateField omitNorms=true
precisionStep=6 positionIncrementGap=0/

I also have the number of twitter, facebook and reads from our site stored.
below is the pseudo code that I want to work out.

var t = (CreationDate - 1131428803) / 1000;
var x = FacebookCount + TwitterCount + VoteCount - DownVoteCount;
var y = 0;
if (x  0) {
   y = 1;
} else if (x == 0) {
  y = 0;
} else if (x  0) {
  y = -1;
}
var z = 1;
var absX = Math.abs(x);
if (absX = 1) {
  z = absX;
}
var ranking = (Math.log(z) / Math.LN10) + ((y * t) / 45000);

I have no Java experience so I cannot re-write it as a custom function.
This is my current query I am trying to use.

http://127.0.0.1:8983/solr/select?q.alt=*:*fq=content_type:newsstart=0rows=10wt=jsonindent=onomitHeader=true
fl=id,name,excerpt,timestamp,domain,source,facebook,twitter,read,imageheight
defType=dismax
tt=div(sub(_val_:timestamp,1131428803),1000)
xx=sub(sum(facebook,twitter,read),0)
yy=map(query($xx),1,,1,map(query($xx),0,0,0,map(query($xx),-,-1,-1,0)))
zz=map(abs(query($xx)),-9,0,1)
sort=sum(div(log(query($zz)),ln(10)),div(product(query($yy),query($tt)),45000))
desc

Currently I am getting errors relating to my date field when trying to
convert it from the TrieDate to timestamp with the _val_:MyDateField.

Also I wanted to know if their is another way to do this? If my query is
even correct.

Thanks in advance

Richard


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014549.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting algorithm

2011-06-02 Thread Richard Hodsdon
Thanks for the response,

You are correct, but my pseudo code was not.
this line
var t = (CreationDate - 1131428803) / 1000; 
should be 
var t = (CreationDate - now()) / 1000; 

This will cause the items ranking to depreciate over time.

Richard


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html
Sent from the Solr - User mailing list archive at Nabble.com.


London open source search social - 13th June

2011-06-01 Thread Richard Marr
Hi guys,

Just to let you know we're meeting up to talk all-things-search on Monday
13th June. There's usually a good mix of backgrounds and experience levels
so if you're free and in the London area then it'd be good to see you there.

Details:
7pm - The Elgin - 96 Ladbrooke Grove
http://www.meetup.com/london-search-social/events/20387881/



Greetings search geeks!

We've booked the next meetup for the 13th June. As usual, the plan is to
meet up and geek out over a friendly beer.

I know my co-organiser René has been working on some interesting search
projects, and I've recently left Empora to work on my own project so by June
I should hopefully have some war stories about using @elasticsearch in
production. The format is completely open though so please bring your own
topics if you've got them.

Hope to see you there!

--
Richard Marr



-- 
Richard Marr


RE: spellcheck.collate returning all results

2011-05-24 Thread Richard Hodsdon
Hi,

Thanks this did the trick.
I am using SOLR 3.1, so I did not need to apply the first patch.

Richard

--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-collate-returning-all-results-tp2975621p2979560.html
Sent from the Solr - User mailing list archive at Nabble.com.


spellcheck.collate returning all results

2011-05-23 Thread Richard Hodsdon
Hi,

I have been trying to set up spellchecking on our system using the
SpellCheckComponent.

According to the wiki by using spellcheck.collate any fq parameters that are
passed through to the original query while doing spellcheck will return
results if the collation is re-run. So far this has not been happening.
I am getting results returned but if I re-run the query passing through the
collated q param it finds nothing.

My initial Query i as follows:
http://127.0.0.1:8983/solr/select?q=reeed%20bulllspellcheck=truespellcheck.collate=truefq=content_type:post

and I get back in the spellcheck lst
lst name=spellcheck
lst name=suggestions
lst name=reeed
int name=numFound1/int
int name=startOffset0/int
int name=endOffset5/int
arr name=suggestion
strred/str
/arr
/lst
lst name=bulll
int name=numFound1/int
int name=startOffset6/int
int name=endOffset11/int
arr name=suggestion
strbull/str
/arr
/lst
str name=collationred bull/str
/lst
/lst

The issue is if I run the query again using the 'correct' query 

http://127.0.0.1:8983/solr/select?q=red%20bullspellcheck=truespellcheck.collate=truefq=content_type:postwt=json

I get no reponses returned. This is because of my content_type:post, which
is filtering correctly. 

I have also run spellcheck.build=true 

I have set up my solrconfig.xml as follows.

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextgen/str
lst name=spellchecker
  str name=classnamesolr.IndexBasedSpellChecker/str
  str name=spellcheckIndexDir./spellchecker/str
  str name=fieldname/str
  str name=buildOnCommittrue/str
  str name=spellcheck.collatetrue/str
/lst
  /searchComponent

requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
 /lst
 arr name=last-components
strspellcheck/str
 /arr
/requestHandler

My scheme.xml declares textgen fieldsType and name field
field name=name type=textgen indexed=true stored=true/
fieldType name=textgen class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

Thanks

Richard



--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-collate-returning-all-results-tp2975621p2975621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Newbie: Starting embedded server with multicore

2011-04-26 Thread Simon, Richard T

I'm just starting with Solr. I'm using Solr 3.1.0, and I want to use 
EmbeddedSolrServer with a multicore setup, even though I currently have only 
one core (various documents I read suggest starting that way even if you have 
one core, to get the better administrative tools supported by mutlicore).

I have two questions:

1.   Does the first code sample below start the server with multicore or 
not?

2.   Why is it the first sample work and the second does not?

My solr.xml looks like this:

solr persistent=true
  cores adminPath=/admin/cores defaultCoreName=mycore sharedLib=lib
core name=mycore instanceDir=mycore /
  /cores
/solr

It's in a directory called solrhome in war/WEB-INF.

I can get the server to come up cleanly if I follow an example in the Packt 
Solr book (p. 231), but I'm not sure if this enables multi-core or not:


  File solrXML = new File(war/WEB-INF/solrhome/solr.xml);

  String solrHome = solrXML.getParentFile().getAbsolutePath();
  String dataDir = solrHome + /data;

coreContainer = new CoreContainer(solrHome);

SolrConfig solrConfig = new SolrConfig(solrHome, solrconfig.xml, 
null);

CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, 
mycore,
solrHome);

SolrCore solrCore = new SolrCore(mycore,
dataDir + / + mycore, solrConfig, null, 
coreDescriptor);

coreContainer.register(solrCore, false);

  embeddedSolr = new EmbeddedSolrServer(coreContainer, 
mycore);


The documentation on the Solr wiki says I should configure the 
EmbeddedSolrServer for multicore like this:

  File home = new File( /path/to/solr/home );
File f = new File( home, solr.xml );
CoreContainer container = new CoreContainer();
container.load( /path/to/solr/home, f );

EmbeddedSolrServer server = new EmbeddedSolrServer( container, core name 
as defined in solr.xml );


When I try to do this, I get an error saying that it cannot find solrconfig.xml:


  File solrXML = new File(war/WEB-INF/solrhome/solr.xml);

  String solrHome = solrXML.getParentFile().getAbsolutePath();

  coreContainer = new CoreContainer();


coreContainer.load(solrHome, solrXML);

  embeddedSolr = new EmbeddedSolrServer(coreContainer, 
mycore);



The message says it is looking in an odd place (I removed my user name from 
this). Why is it looking in solrhome/mycore/conf for solrconfig.xml? Both that 
and my schema.xml are in solrhome/conf. How can I point it at the right place? 
I tried adding 
REMOVED\workspace-Solr\institution-webapp\war\WEB-INF\solrhome\conf to the 
classpath, but got the same result:


SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in 
classpath or 
'REMOVED\workspace-Solr\institution-webapp\war\WEB-INF\solrhome\mycore\conf/',
 cwd=REMOVED\workspace-Solr\institution-webapp
  at 
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268)
  at 
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:234)
  at org.apache.solr.core.Config.init(Config.java:141)
  at org.apache.solr.core.SolrConfig.init(SolrConfig.java:132)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:430)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)





RE: DIH serialize

2011-01-25 Thread Papp Richard
Dear Stefan,

  thank you for your help! 
  Well, I wrote a small script, even if not json, but works:

  script![CDATA[
function my_serialize(row)
{
  var st = ;
  
  st = row.get('stt_id') + || +
row.get('stt_name') + || +
row.get('stt_date_from') + || +
row.get('stt_date_to') + || +
row.get('stt_monday') + || +
row.get('stt_tuesday') + || +
row.get('stt_wednesday') + || +
row.get('stt_thursday') + || +
row.get('stt_friday') + || +
row.get('stt_saturday') + || +
row.get('stt_sunday') ;

  var ret = new java.util.HashMap();
  ret.put('main_timetable', st);
  
  return ret;
}
  ]]/script

regards,
  Rich

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@googlemail.com] 
Sent: Tuesday, January 25, 2011 11:13
To: solr-user@lucene.apache.org
Subject: Re: DIH serialize

Rich,

i played around for a few minutes with Script-Transformers, but i have not
enough knowledge to get anything done right know :/
My Idea was: looping over the given row, which should be a Java HashMap or
something like that? and do sth like this (pseudo-code):

var row_data = []
for( var key in row )
{
  row_data.push( '' + key + ' : ' + row[key] + '' );
}
row.put( 'whatever_field', '{' + row_data.join( ',' ) + '}' );

Which should result in a json-object like {'key1':'value1', 'key2':'value2'}
- and that should be okay to work with?

Regards
Stefan

On Mon, Jan 24, 2011 at 7:53 PM, Papp Richard ccode...@gmail.com wrote:

 Hi Stefan,

  yes, this is exactly what I intend - I don't want to search in this field
 - just quicly return me the result in a serialized form (the search
 criteria
 is on other fields). Well, if I could serialize the data exactly as like
 the
 PHP serialize() does I would be maximally satisfied, but any other form in
 which I could compact the data easily into one field I would be pleased.
  Can anyone help me? I guess the script is quite a good way, but I don't
 know which function should I use there to compact the data to be easily
 usable in PHP. Or any other method?

 thanks,
  Rich

 -Original Message-
 From: Stefan Matheis [mailto:matheis.ste...@googlemail.com]
 Sent: Monday, January 24, 2011 18:23
 To: solr-user@lucene.apache.org
 Subject: Re: DIH serialize

 Hi Rich,

 i'm a bit confused after reading your post .. what exactly you wanna try
to
 achieve? Serializing (like http://php.net/serialize) your complete row
 into
 one field? Don't wanna search in them, just store and deliver them in your
 results? Does that make sense? Sounds a bit strange :)

 Regards
 Stefan

 On Mon, Jan 24, 2011 at 10:03 AM, Papp Richard ccode...@gmail.com wrote:

  Hi Dennis,
 
   thank you for your answer, but didn't understand why you say it doesn't
  need serialization. I'm with the option C.
   but the main question is, how to put into one field a result of many
  fields: SELECT * FROM.
 
  thanks,
   Rich
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Monday, January 24, 2011 02:07
  To: solr-user@lucene.apache.org
  Subject: Re: DIH serialize
 
  Depends on your process chain to the eventual viewer/consumer of the
 data.
 
  The questions to ask are:
   A/ Is the data IN Solr going to be viewed or processed in its original
  form:
   --set stored = 'true'
  ---no serialization needed.
   B/ If it's going to be anayzed and searched for separate from any other
  field,
 
   the analyzing will put it into  an unreadable form. If you need to
 see
  it,
  then
  ---set indexed=true and stored=true
  ---no serializaton needed.   C/ If it's NOT going to be viewed AS
 IS,
  and
  it's not going to be searched for AS IS,
(i.e. other columns will be how the data is found), and you have
  another,
 
serialzable format:
--set indexed=false and stored=true
--serialize AS PER THE INTENDED APPLICATION,
not sure that Solr can do that at all.
   C/ If it's NOT going to be viewed AS IS, and it's not going to be
 searched
  for
  AS IS,
(i.e. other columns will be how the data is found), and you have
  another,
 
serialzable format:
--set indexed=false and stored=true
--serialize AS PER THE INTENDED APPLICATION,
not sure that Solr can do that at all.
   D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched
 for
  AS
  IS,
(this column will be how the data is found), and you have another,
serialzable format:
--you need to put it into TWO columns
--A SERIALIZED FIELD
--set indexed=false and stored=true
 
   --AN UNSERIALIZED FIELD
--set indexed=false and stored=true
--serialize AS PER THE INTENDED APPLICATION,
not sure that Solr can do that at all.
 
  Hope that helps!
 
 
  Dennis Gearon
 
 
  Signature Warning

RE: DIH serialize

2011-01-24 Thread Papp Richard
Hi Dennis,

  thank you for your answer, but didn't understand why you say it doesn't need 
serialization. I'm with the option C.
  but the main question is, how to put into one field a result of many fields: 
SELECT * FROM.

thanks,
  Rich

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Monday, January 24, 2011 02:07
To: solr-user@lucene.apache.org
Subject: Re: DIH serialize

Depends on your process chain to the eventual viewer/consumer of the data.

The questions to ask are:
  A/ Is the data IN Solr going to be viewed or processed in its original form:
  --set stored = 'true'
 ---no serialization needed.
  B/ If it's going to be anayzed and searched for separate from any other 
field, 

  the analyzing will put it into  an unreadable form. If you need to see 
it, 
then
 ---set indexed=true and stored=true
 ---no serializaton needed.   C/ If it's NOT going to be viewed AS IS, and 
it's not going to be searched for AS IS,
   (i.e. other columns will be how the data is found), and you have 
another, 

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  C/ If it's NOT going to be viewed AS IS, and it's not going to be searched 
for 
AS IS,
   (i.e. other columns will be how the data is found), and you have 
another, 

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched for AS 
IS,
   (this column will be how the data is found), and you have another, 
   serialzable format:
   --you need to put it into TWO columns
   --A SERIALIZED FIELD
   --set indexed=false and stored=true

  --AN UNSERIALIZED FIELD
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.

Hope that helps!


Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Papp Richard ccode...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sun, January 23, 2011 2:02:05 PM
Subject: DIH serialize

Hi all,



  I wasted the last few hours trying to serialize some column values (from
mysql) into a Solr column, but I just can't find such a function. I'll use
the value in PHP - I don't know if it is possible to serialize in PHP style
at all. This is what I tried and works with a given factor:



in schema.xml:

   field name=main_timetable  type=text indexed=false
stored=true multiValued=true /



in DIH xml:



dataConfig

  script![CDATA[

function my_serialize(row)

{

  row.put('main_timetable', row.toString());

  return row;

}

  ]]/script



.



  entity name=main_timetable query=

SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id}';

transformer=script:my_serialize



.

 



  Can I use java directly in script (script language=Java) ?

  How could I achieve this? Or any other idea? 

  I need these values together (from a row) and I need then in PHP to handle
the result easily.



thanks,

  Rich
 

__ Information from ESET NOD32 Antivirus, version of virus signature 
database 5740 (20101228) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature 
database 5740 (20101228) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: DIH serialize

2011-01-24 Thread Papp Richard
Hi Stefan,

  yes, this is exactly what I intend - I don't want to search in this field
- just quicly return me the result in a serialized form (the search criteria
is on other fields). Well, if I could serialize the data exactly as like the
PHP serialize() does I would be maximally satisfied, but any other form in
which I could compact the data easily into one field I would be pleased.
  Can anyone help me? I guess the script is quite a good way, but I don't
know which function should I use there to compact the data to be easily
usable in PHP. Or any other method?

thanks,
  Rich

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@googlemail.com] 
Sent: Monday, January 24, 2011 18:23
To: solr-user@lucene.apache.org
Subject: Re: DIH serialize

Hi Rich,

i'm a bit confused after reading your post .. what exactly you wanna try to
achieve? Serializing (like http://php.net/serialize) your complete row into
one field? Don't wanna search in them, just store and deliver them in your
results? Does that make sense? Sounds a bit strange :)

Regards
Stefan

On Mon, Jan 24, 2011 at 10:03 AM, Papp Richard ccode...@gmail.com wrote:

 Hi Dennis,

  thank you for your answer, but didn't understand why you say it doesn't
 need serialization. I'm with the option C.
  but the main question is, how to put into one field a result of many
 fields: SELECT * FROM.

 thanks,
  Rich

 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Monday, January 24, 2011 02:07
 To: solr-user@lucene.apache.org
 Subject: Re: DIH serialize

 Depends on your process chain to the eventual viewer/consumer of the data.

 The questions to ask are:
  A/ Is the data IN Solr going to be viewed or processed in its original
 form:
  --set stored = 'true'
 ---no serialization needed.
  B/ If it's going to be anayzed and searched for separate from any other
 field,

  the analyzing will put it into  an unreadable form. If you need to
see
 it,
 then
 ---set indexed=true and stored=true
 ---no serializaton needed.   C/ If it's NOT going to be viewed AS IS,
 and
 it's not going to be searched for AS IS,
   (i.e. other columns will be how the data is found), and you have
 another,

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  C/ If it's NOT going to be viewed AS IS, and it's not going to be
searched
 for
 AS IS,
   (i.e. other columns will be how the data is found), and you have
 another,

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched
for
 AS
 IS,
   (this column will be how the data is found), and you have another,
   serialzable format:
   --you need to put it into TWO columns
   --A SERIALIZED FIELD
   --set indexed=false and stored=true

  --AN UNSERIALIZED FIELD
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.

 Hope that helps!


 Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others' mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Papp Richard ccode...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sun, January 23, 2011 2:02:05 PM
 Subject: DIH serialize

 Hi all,



  I wasted the last few hours trying to serialize some column values (from
 mysql) into a Solr column, but I just can't find such a function. I'll use
 the value in PHP - I don't know if it is possible to serialize in PHP
style
 at all. This is what I tried and works with a given factor:



 in schema.xml:

   field name=main_timetable  type=text indexed=false
 stored=true multiValued=true /



 in DIH xml:



 dataConfig

  script![CDATA[

function my_serialize(row)

{

  row.put('main_timetable', row.toString());

  return row;

}

  ]]/script



 .



  entity name=main_timetable query=

SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id
 }';

transformer=script:my_serialize



 .

 



  Can I use java directly in script (script language=Java) ?

  How could I achieve this? Or any other idea?

  I need these values together (from a row) and I need then in PHP to
handle
 the result easily.



 thanks,

  Rich


 __ Information from ESET NOD32 Antivirus, version of virus
 signature database 5740 (20101228) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com



 __ Information

DIH serialize

2011-01-23 Thread Papp Richard
Hi all,

 

  I wasted the last few hours trying to serialize some column values (from
mysql) into a Solr column, but I just can't find such a function. I'll use
the value in PHP - I don't know if it is possible to serialize in PHP style
at all. This is what I tried and works with a given factor:

 

in schema.xml:

   field name=main_timetable  type=text indexed=false
stored=true multiValued=true /

 

in DIH xml:

 

dataConfig

  script![CDATA[

function my_serialize(row)

{

  row.put('main_timetable', row.toString());

  return row;

}

  ]]/script

 

.

 

  entity name=main_timetable query=

SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id}';

transformer=script:my_serialize



.

 

 

  Can I use java directly in script (script language=Java) ?

  How could I achieve this? Or any other idea? 

  I need these values together (from a row) and I need then in PHP to handle
the result easily.

 

thanks,

  Rich



solr admin

2010-11-29 Thread Papp Richard
Hello,

  is there any way to specify in the solr admin other than fields? and I'm
nt talking about the full interface which is also very limited.

  like: score, fl, fq, ...

  and yes, I know that I can use the url... which indeed is not too handy.

thanks,
  Rich
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



special sorting

2010-11-29 Thread Papp Richard
Hello,

  I have many pages with the same content in the search result (the result
is the same for some of the cities from the same county)... which means that
I have duplicate content.

  the filter query is something like: +locationId:(60 26a 39a) - for city
with ID 60
  and I get the same result for city with ID 62: +locationId:(62 26a 39a)
(cityID, countyID, countryID)

  how could I use a sorting to have different docs order in results for
different cities?
  (for the same city I need to have the same sort order always - it cannot
be a simple random...)

  could I use somehow the cityID parameter as boost or score ? I tried but
could't realise too much.

thanks,
  Rich
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: solr admin

2010-11-29 Thread Papp Richard
in Solr admin (http://localhost:8180/services/admin/)
I can specify something like:

+category_id:200 +xxx:300

but how can I specify a sort option?

sort:category_id+asc

regards,
  Rich

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, November 29, 2010 22:00
To: solr-user@lucene.apache.org
Subject: Re: solr admin

I honestly don't understand what you're asking here. Specify what
in solr admin other than fields? what is it you're trying to accomplish?

Best
Erick

On Mon, Nov 29, 2010 at 2:56 PM, Papp Richard ccode...@gmail.com wrote:

 Hello,

  is there any way to specify in the solr admin other than fields? and I'm
 nt talking about the full interface which is also very limited.

  like: score, fl, fq, ...

  and yes, I know that I can use the url... which indeed is not too handy.

 thanks,
  Rich


 __ Information from ESET NOD32 Antivirus, version of virus
 signature
 database 5659 (20101129) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com



 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: special sorting

2010-11-29 Thread Papp Richard
Hmm, any clue how to use it? use the location_id somehow?

thanks,
  Rich

-Original Message-
From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Monday, November 29, 2010 22:08
To: solr-user@lucene.apache.org
Subject: Re: special sorting

Perhaps, depending on your domain logic you could use function queries to
achieve that.
http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
Regards,
Tommaso

2010/11/29 Papp Richard ccode...@gmail.com

 Hello,

  I have many pages with the same content in the search result (the result
 is the same for some of the cities from the same county)... which means
 that
 I have duplicate content.

  the filter query is something like: +locationId:(60 26a 39a) - for city
 with ID 60
  and I get the same result for city with ID 62: +locationId:(62 26a 39a)
 (cityID, countyID, countryID)

  how could I use a sorting to have different docs order in results for
 different cities?
  (for the same city I need to have the same sort order always - it cannot
 be a simple random...)

  could I use somehow the cityID parameter as boost or score ? I tried but
 could't realise too much.

 thanks,
  Rich


 __ Information from ESET NOD32 Antivirus, version of virus
 signature
 database 5659 (20101129) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com



 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
Dear Yonik,

  this is fantastic, but can you tell any time it will be ready ?
  I would need this feature in two weeks. Is it possible to finish and make
an update in this time or should I look for another solution cocerning the
pgaination (like implement just more results link instead of pagination) ?

best regards,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Saturday, October 30, 2010 19:29
To: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sat, Oct 30, 2010 at 12:22 PM, Papp Richard ccode...@gmail.com wrote:
  I'm using Solr 4.0 with grouping (field collapsing), but unfortunately I
 can't solve the pagination.

It's not implemented yet, but I'm working on that right now.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5576 (20101029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
thank you very much Yonik! 
you are a magician!

regards,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, November 07, 2010 18:04
To: Papp Richard
Cc: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote:
  this is fantastic, but can you tell any time it will be ready ?

It already is ;-)  Grab the latest trunk or the latest nightly build.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
Hi Yonik,

  I've just tried the latest stable version from nightly build:
apache-solr-4.0-2010-11-05_08-06-28.war

  I have some concerns however: I have 3 documents; 2 in the first group, 1
in the 2nd group.
  
  1. I got for matches 3 - which is good, but I still don't know how many
groups I have. (using start = 0, rows = 10)
  2. as far as I see the start / rows is working now, but the matches is
returned incorrectly = it said matches = 3 instead of = 1, when I used
start = 1, rows = 1

  so can you help me, how to compute how many pages I'll have, because the
matches can't use for this.

regards,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, November 07, 2010 18:04
To: Papp Richard
Cc: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote:
  this is fantastic, but can you tell any time it will be ready ?

It already is ;-)  Grab the latest trunk or the latest nightly build.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5599 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
Hey Yonik,

  Sorry, I think the matches is ok - because it probably returns always the
total document number - however I don't know how to compute the number of
pages.

thanks,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, November 07, 2010 18:04
To: Papp Richard
Cc: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote:
  this is fantastic, but can you tell any time it will be ready ?

It already is ;-)  Grab the latest trunk or the latest nightly build.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5599 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
I see. Let's assume that there are 1000 groups.
Can I use safely (with no negative impact on memory usage or slowness) the
start = 990, rows = 10 to get the latest page?
Or this will not work, due you will need to compute all the groups till
1000, in order to return the last 10, and because of this the whole will be
slow / memory usage will increase considerably.

regards,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, November 07, 2010 21:54
To: Papp Richard
Cc: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sun, Nov 7, 2010 at 2:45 PM, Papp Richard ccode...@gmail.com wrote:
 Hi Yonik,

  I've just tried the latest stable version from nightly build:
 apache-solr-4.0-2010-11-05_08-06-28.war

  I have some concerns however: I have 3 documents; 2 in the first group, 1
 in the 2nd group.

  1. I got for matches 3 - which is good, but I still don't know how many
 groups I have. (using start = 0, rows = 10)
  2. as far as I see the start / rows is working now, but the matches is
 returned incorrectly = it said matches = 3 instead of = 1, when I used
 start = 1, rows = 1

matches is the number of documents before grouping, so start/rows or
group.offset/group.limit will not affect this number.

  so can you help me, how to compute how many pages I'll have, because the
 matches can't use for this.

Solr doesn't even know given the current algorithm, hence it can't
return that info.

The issue is that to calculate the total number of groups, we would
need to keep each group in memory (which could cause a big blowup if
there are tons of groups).  The current algorithm only keeps the top
10 groups (assuming rows=10) in memory at any one time, hence it has
no idea what the total number of groups is.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5599 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5599 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



solr 4.0 - pagination

2010-10-30 Thread Papp Richard
Hi all,

 

  I'm using Solr 4.0 with grouping (field collapsing), but unfortunately I
can't solve the pagination.

  Mainly there are two problems:

-  the query fields start  rows doesn't work anymore -
beside of the values, it always returns the data as the start would be 0
(start = 0)

-  the result contains just the total document number and not
the total groups number

 

  Can anyone help me, how to solve this?

 

regards,

  Rich



RE: solr 4.0 - pagination

2010-10-30 Thread Papp Richard
Can you estimate please when it will be done?


thanks,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Saturday, October 30, 2010 19:29
To: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sat, Oct 30, 2010 at 12:22 PM, Papp Richard ccode...@gmail.com wrote:
  I'm using Solr 4.0 with grouping (field collapsing), but unfortunately I
 can't solve the pagination.

It's not implemented yet, but I'm working on that right now.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5576 (20101029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5576 (20101029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



London open-source search social - 28th Oct - NEW VENUE

2010-10-25 Thread Richard Marr
Just a reminder that we're meeting this Thursday near St James Park/Westminster.

Details on the Meetup page:
http://www.meetup.com/london-search-social/

Rich


-- 
Richard Marr


London open-source search social - 28th Nov - NEW VENUE

2010-10-20 Thread Richard Marr
Hi all,

We've booked a London Search Social for Thursday the 28th Sept. Come
along if you fancy geeking out about search and related technology
over a beer.

Please note that we're not meeting in the same place as usual. Details
on the meetup page.
http://www.meetup.com/london-search-social/

Rich


Re: London open-source search social - 28th Nov - NEW VENUE

2010-10-20 Thread Richard Marr
Wow, apologies for utter stupidity. Both subject line and body should
have read 28th OCT.



On 20 October 2010 15:42, Richard Marr richard.m...@gmail.com wrote:
 Hi all,

 We've booked a London Search Social for Thursday the 28th Sept. Come
 along if you fancy geeking out about search and related technology
 over a beer.

 Please note that we're not meeting in the same place as usual. Details
 on the meetup page.
 http://www.meetup.com/london-search-social/

 Rich




-- 
Richard Marr


RE: Grouping in solr ?

2010-09-30 Thread Papp Richard
I'm really sorry - thank you for the note.

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Tuesday, September 28, 2010 05:12
To: solr-user@lucene.apache.org
Subject: Re: Grouping in solr ?

: References:
: abcc5d9ce0798544a169c584b8f1447d230313c...@exchange01.toolbox.local
: In-Reply-To:
: abcc5d9ce0798544a169c584b8f1447d230313c...@exchange01.toolbox.local
: Subject: Grouping in solr ?

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



Grouping in solr ?

2010-09-23 Thread Papp Richard
Hi all,

  is it possible somehow to group documents?
  I have services as documents, and I would like to show the filtered
services grouped by company. 
  So I filter services by given criteria, but I show the results grouped by
companay.
  If I got 1000 services, maybe I need to show just 100 companies (this will
affect pagination as well), and how could I get the company info? Should I
store the company info in each service (I don't need the compnany info to be
indexed) ?

regards,
  Rich
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: Grouping in solr ?

2010-09-23 Thread Papp Richard
thank you!
this is really helpful. just tried it and it's amazing.
do you know, how trustable is a nightly built version (solr4) ?

Rich

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@buyways.nl] 
Sent: Thursday, September 23, 2010 22:38
To: solr-user@lucene.apache.org
Subject: RE: Grouping in solr ?

http://wiki.apache.org/solr/FieldCollapsing

https://issues.apache.org/jira/browse/SOLR-236

 
-Original message-
From: Papp Richard ccode...@gmail.com
Sent: Thu 23-09-2010 21:29
To: solr-user@lucene.apache.org; 
Subject: Grouping in solr ?

Hi all,

 is it possible somehow to group documents?
 I have services as documents, and I would like to show the filtered
services grouped by company. 
 So I filter services by given criteria, but I show the results grouped by
companay.
 If I got 1000 services, maybe I need to show just 100 companies (this will
affect pagination as well), and how could I get the company info? Should I
store the company info in each service (I don't need the compnany info to be
indexed) ?

regards,
 Rich


__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com


 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



tricky range query?

2010-09-21 Thread Papp Richard
Hi all,

 

  shortly my problem: I want to filter services based on timetables, let's
consider the next timetable for a day:

 

on the date of 15.10.2010:

10:00 - 11:00

12:00 - 12:30

14:30 - 16:00

17:00 - 20:00

 

  how could i store the timetable in Solr and efficiently search in it
(let's say filter those timetables which has an availability at 15:00) ? 

  not to mention, that each service has a duration (so, if the service takes
90 mins, filtering by 15:00 shouldn't return the previous timetable, because
there is not enough free time (just 60 mins in the above example))

 

  how to solve this? any hints?

 

regards,

  Rich



RE: tricky range query?

2010-09-21 Thread Papp Richard
Hi Erik,

  first of all, thank you for your answer. Let me detail a bit the amount of
data:

- actually services going to persons, and the time table is per person (a
person can have multiple services).
- there will be around 10.000 person (or maybe 100.000 - I would like to say
rather 100.000 than have problems later)
- but time table can differ from week to week, so each person has many time
table (one for each week) = so this means that if they have the timetables
for ~3 months (12 weeks)... 100.000 x 12 ~ 1.000.000 timetabels... and each
time table has 7 days... and on each day we have many periods (as someone
books a service, the timetbale will be modified, and possible will result in
time gaps, like I show in the example)... so all in all there are too many
data, is it?
- I've checkte the trie, but couldn't find too much info. I don't know if
it could be a solution to us e it or not - I'm not a solr expert.

regards,
  Rich


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, September 21, 2010 14:40
To: solr-user@lucene.apache.org
Subject: Re: tricky range query?

How efficient it would be I don't know, but depending on how
many services you're talking here, efficiency may not be
that big of a deal...

But storing each interval as its own record along
with a duration should work. You could then form a query
like duration:[90 to *] AND start_time:[* to 1500] AND
end_time:[1500 TO *]. I'm not sure I'd want that kind of
query on a gigabyte of records...

But without knowing some more details, it's impossible to
say whether this would be at all suitable...

Best
Erick

On Tue, Sep 21, 2010 at 4:41 AM, Papp Richard ccode...@gmail.com wrote:

 Hi all,



  shortly my problem: I want to filter services based on timetables, let's
 consider the next timetable for a day:



 on the date of 15.10.2010:

 10:00 - 11:00

 12:00 - 12:30

 14:30 - 16:00

 17:00 - 20:00



  how could i store the timetable in Solr and efficiently search in it
 (let's say filter those timetables which has an availability at 15:00) ?

  not to mention, that each service has a duration (so, if the service
takes
 90 mins, filtering by 15:00 shouldn't return the previous timetable,
 because
 there is not enough free time (just 60 mins in the above example))



  how to solve this? any hints?



 regards,

  Rich


 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: tricky range query?

2010-09-21 Thread Papp Richard
Hi Erick,

  don't really understand your question and what exactly the point is, but
anyway. yes - there is a DB where data are stored, however the scheduling is
just a part of the whole picture. I thought to use Solr for search /
filtering results - the schedule (availability) is just one filter from the
whole search process. Does it make sense for you? May I ask if you are a
Solr specialist? I don't know how serious I have to take in account your
answers.

thank you,
  Rich

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, September 21, 2010 20:36
To: solr-user@lucene.apache.org
Subject: Re: tricky range query?

So it sounds like you're working on some kind of scheduling app? Which
makes me wonder why you're using SOLR. Much as I like it, this sounds
more like a database application than a search application. What am I
missing?

Best
Erick

On Tue, Sep 21, 2010 at 1:05 PM, Papp Richard ccode...@gmail.com wrote:

 Hi Erik,

  first of all, thank you for your answer. Let me detail a bit the amount
of
 data:

 - actually services going to persons, and the time table is per person (a
 person can have multiple services).
 - there will be around 10.000 person (or maybe 100.000 - I would like to
 say
 rather 100.000 than have problems later)
 - but time table can differ from week to week, so each person has many
time
 table (one for each week) = so this means that if they have the
timetables
 for ~3 months (12 weeks)... 100.000 x 12 ~ 1.000.000 timetabels... and
each
 time table has 7 days... and on each day we have many periods (as someone
 books a service, the timetbale will be modified, and possible will result
 in
 time gaps, like I show in the example)... so all in all there are too many
 data, is it?
 - I've checkte the trie, but couldn't find too much info. I don't know
if
 it could be a solution to us e it or not - I'm not a solr expert.

 regards,
  Rich


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, September 21, 2010 14:40
 To: solr-user@lucene.apache.org
 Subject: Re: tricky range query?

 How efficient it would be I don't know, but depending on how
 many services you're talking here, efficiency may not be
 that big of a deal...

 But storing each interval as its own record along
 with a duration should work. You could then form a query
 like duration:[90 to *] AND start_time:[* to 1500] AND
 end_time:[1500 TO *]. I'm not sure I'd want that kind of
 query on a gigabyte of records...

 But without knowing some more details, it's impossible to
 say whether this would be at all suitable...

 Best
 Erick

 On Tue, Sep 21, 2010 at 4:41 AM, Papp Richard ccode...@gmail.com wrote:

  Hi all,
 
 
 
   shortly my problem: I want to filter services based on timetables,
let's
  consider the next timetable for a day:
 
 
 
  on the date of 15.10.2010:
 
  10:00 - 11:00
 
  12:00 - 12:30
 
  14:30 - 16:00
 
  17:00 - 20:00
 
 
 
   how could i store the timetable in Solr and efficiently search in it
  (let's say filter those timetables which has an availability at 15:00) ?
 
   not to mention, that each service has a duration (so, if the service
 takes
  90 mins, filtering by 15:00 shouldn't return the previous timetable,
  because
  there is not enough free time (just 60 mins in the above example))
 
 
 
   how to solve this? any hints?
 
 
 
  regards,
 
   Rich
 
 


 __ Information from ESET NOD32 Antivirus, version of virus
 signature
 database 5419 (20100902) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com



 __ Information from ESET NOD32 Antivirus, version of virus
 signature
 database 5419 (20100902) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com



 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: trie

2010-09-21 Thread Papp Richard
thank you guys for the answers... now I have to check / read some docs ;)

Rich

-Original Message-
From: Simon Willnauer [mailto:simon.willna...@googlemail.com] 
Sent: Tuesday, September 21, 2010 23:00
To: solr-user@lucene.apache.org
Subject: Re: trie

2010/9/21 Péter Király kirun...@gmail.com:
 You can read about it in Lucene in Action second edition.
have a look at 
http://www.lucidimagination.com/developer/whitepaper/Whats-New-in-Apache-Lucene-3-0

page 4 to 8 should give you a good intro to the topic

simon

 Péter

 2010/9/21 Papp Richard ccode...@gmail.com:
  is there any good tutorial how to use and what is trie? what I found on the
 net is really blurry.

 rgeards,
  Rich


 __ Information from ESET NOD32 Antivirus, version of virus signature
 database 5419 (20100902) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com




 

__ Information from ESET NOD32 Antivirus, version of virus signature 
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature 
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



  1   2   >