Re: How can I do efficient FSIndex lookup?

2019-09-05 Thread Mario Juric
Hi,

Thanks for responding.

I tried with a temporary FS where the key value was set, but I got every 
annotation from the index, so that didn’t appear to change anything, and it 
also broke my unit tests immediately. I also  stepped through the iterator 
implementation and found construction of the iterator quite a bit complex with 
an FS, so that went over my head without spending time to get a deeper 
understanding of the underlying index implementation. Therefore I tried with an 
indexed FS and this seemed to return the correct items, but it would be awkward 
having to add some FS to the index in order to retrieve something else and then 
having to remove the FS from the index again. I am now also in doubt about the 
insertion costs, but I haven’t measured that yet.

I am not sure how many use custom FSIndex, but currently the API doesn’t really 
support very well the type of use cases that we are working with, so this is a 
disappointment for us. Does UIMA 3 improve on this? We are still on 2.x since 
we are awaiting the next major DKPro release with UIMA 3 because of 
dependencies.

Thanks a lot and cheers,
Mario












> On 5 Sep 2019, at 23:42 , Richard Eckart de Castilho  wrote:
> 
> On 5. Sep 2019, at 23:40, Marshall Schor  wrote:
>> 
>> The normal way to get the "binary search" kind of behavior is to get a plain
>> iterator over the sorted index, and then use the moveTo method, specifying a
>> target FS as the one to move to.  The target FS can be a "temporary" FS, one
>> that is never added to the indexes, itself; it is just used to supply values
>> used in the comparison.
> 
> Is there a way to do this using a "temporary" FS which does not take up CAS 
> heap
> space in UIMAv2?
> 
> -- Richard



Re: How can I do efficient FSIndex lookup?

2019-09-05 Thread Richard Eckart de Castilho
On 5. Sep 2019, at 23:40, Marshall Schor  wrote:
> 
> The normal way to get the "binary search" kind of behavior is to get a plain
> iterator over the sorted index, and then use the moveTo method, specifying a
> target FS as the one to move to.  The target FS can be a "temporary" FS, one
> that is never added to the indexes, itself; it is just used to supply values
> used in the comparison.

Is there a way to do this using a "temporary" FS which does not take up CAS heap
space in UIMAv2?

-- Richard

Re: How can I do efficient FSIndex lookup?

2019-09-05 Thread Marshall Schor
Perhaps the use of a filtered iterator went in the wrong direction.

The normal way to get the "binary search" kind of behavior is to get a plain
iterator over the sorted index, and then use the moveTo method, specifying a
target FS as the one to move to.  The target FS can be a "temporary" FS, one
that is never added to the indexes, itself; it is just used to supply values
used in the comparison.

With this, you can "jump to" the nearest element (see the javadocs for the exact
definition of this).

Does this help?

When using uima version 3, the moveto method can be made to ignore type
priorities in the ordering, which is what is wanted in many use cases.  See
http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select

-Marshall

On 9/4/2019 3:45 PM, Mario Juric wrote:
> Hi,
>
> I created a custom FSIndex for an annotation type in the hope of speeding up 
> lookup based on one of it’s fields, but after some profiling I found to my 
> surprise that this doesn’t appear to be what I get. I specified the index to 
> be sorted according to two fields where the first is a key and the next is a 
> value field. After creating a filtered iterator with the key field as one of 
> the constraints I thought it would do a quick lookup to the first element in 
> the list that matches the key constraint, after all it’s sorted according to 
> that field, so I assume at least binary search is possible, but to my 
> surprise that is not what happens. It seems to simply iterate through all 
> elements and skips those that don’t match the constraint. There doesn’t seem 
> to be other ways I can do a more efficient jump to the first element in the 
> index and then stop iterating when the key no longer matches.
>
> I am somewhat baffled by this, and it appears to me I could have achieved the 
> same using a normal select with some simple filtering, which kinda makes the 
> FSIndex redundant. There is another way to obtain an iterator, which takes a 
> FeatureStructure, but I am not sure if that is more efficient, and does this 
> mean that you create FeatureStructures for the sole purpose of lookup into 
> the index? I would appreciate if someone could explain this to me, thanks! :)
>
> Cheers,
> Mario
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: DUCC without shared file system

2019-09-05 Thread Eddie Epstein
Unless all CLI/API submissions are done from the head node, DUCC still has
a dependency on a shared filesystem to authenticate such requests for
configurations where user processes run with user credentials.

On Wed, Sep 4, 2019 at 9:41 AM Lou DeGenaro  wrote:

> The DUCC Book for the Apache-UIMA DUCC demo
> http://uima-ducc-demo.apache.org:42133/doc/duccbook.html has been updated
> with respect to Jira 6121.  In particular, see section 12.9 for an example
> use of ducc_rsync to install DUCC on an additional worker node when not
> using a shared filesystem for $DUCC_HOME.
>
> On Tue, Sep 3, 2019 at 5:06 PM Lou DeGenaro 
> wrote:
>
> > I opened Jira https://issues.apache.org/jira/browse/UIMA-6121 to track
> > this issue.
> >
> >
> > On Tue, Sep 3, 2019 at 1:51 PM Lou DeGenaro 
> > wrote:
> >
> >> You not need do anything special to run DUCC without a shared
> >> filesystem.  Simply install it on a local filesystem.  However, there is
> >> one caveat.  If the user's (e.g. DUCC jobs) log directory is not in a
> >> shared filesystem, then DUCC-Mon will not have access and the contents
> >> won't be viewable. I'll open a Jira to review the DUCC Book and
> fix/clarify
> >> shared file system requirements.
> >>
> >> Lou.
> >>
> >> On Tue, Sep 3, 2019 at 11:58 AM Wahed Hemati <
> hem...@em.uni-frankfurt.de>
> >> wrote:
> >>
> >>> Hi there,
> >>>
> >>> the release notes of DUCC 3.0.0 indicates, that one major change is,
> >>> that DUCC can now run without shared file system.
> >>>
> >>> How do I set this up? In the Duccbook however it says that you need a
> >>> shared filesystem to add more nodes
> >>> (https://uima.apache.org/d/uima-ducc-3.0.0/duccbook.html#x1-22400012.9
> ).
> >>>
> >>> Thanks in advance.
> >>>
> >>> -Wahed
> >>>
> >>>
>