Re: [Cdk-user] DepictionGenerator seems to createa dir named ?

2024-02-01 Thread John Mayfield
Just checked on a linux server and I see the directories.

[john@??? ~]$ find ~/.java
/home/john/.java/fonts
/home/john/.java/fonts/1.8.0_102
/home/john/.java/fonts/1.8.0_102/fcinfo-1-hal-RedHat-7.1.1503-en.properties
/home/john/.java/fonts/1.8.0_45
/home/john/.java/fonts/1.8.0_45/fcinfo-1-hal-RedHat-7.6.1810-en.properties

On Thu, 1 Feb 2024 at 16:05, John Mayfield 
wrote:

> I don't think you can stop it creating the file:
> https://habr.com/en/articles/735688/.
>
> Make sure you have a home directory and make sure the docker container has
> some fonts installed (see the cdk/depict Docker for an example).
>
> On Thu, 1 Feb 2024 at 16:03, Tim Dudgeon  wrote:
>
>> Yes, gradle is the build system. But at runtime there should be no Gradle
>> involved.
>>
>> On Thu, Feb 1, 2024 at 4:02 PM Tim Dudgeon  wrote:
>>
>>> Yes, I'm sure it's Java creating it.
>>> I'm just using the DepictionGenerator from compiled Java code that is
>>> run from in a Docker container.
>>>
>>> On Thu, Feb 1, 2024 at 3:59 PM John Mayfield <
>>> john.wilkinson...@gmail.com> wrote:
>>>
>>>> How are you running it? We don't create it explicitly but likely in the
>>>> depths the JVM to support the font metrics needed to generate a depiction.
>>>>
>>>> John
>>>>
>>>> On Thu, 1 Feb 2024 at 15:43, Tim Dudgeon  wrote:
>>>>
>>>>> I'm seeing a strange issue with a directory named ? (yes, that's a
>>>>> question mark) when I run code that uses DepictionGenerator. This seems to
>>>>> contain font information.
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>> I'm blaming DepictionGenerator because I can't see what else could be
>>>>> doing this, but I could of course be wrong.
>>>>>
>>>>> Any idea why this is happening and suggestions for how to prevent
>>>>> it from happening?
>>>>> Of course now I know about it I can delete it afterwards, but ...
>>>>>
>>>>> Thanks
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>> ___
>>>>> Cdk-user mailing list
>>>>> Cdk-user@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>>>
>>>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] DepictionGenerator seems to createa dir named ?

2024-02-01 Thread John Mayfield
I don't think you can stop it creating the file:
https://habr.com/en/articles/735688/.

Make sure you have a home directory and make sure the docker container has
some fonts installed (see the cdk/depict Docker for an example).

On Thu, 1 Feb 2024 at 16:03, Tim Dudgeon  wrote:

> Yes, gradle is the build system. But at runtime there should be no Gradle
> involved.
>
> On Thu, Feb 1, 2024 at 4:02 PM Tim Dudgeon  wrote:
>
>> Yes, I'm sure it's Java creating it.
>> I'm just using the DepictionGenerator from compiled Java code that is run
>> from in a Docker container.
>>
>> On Thu, Feb 1, 2024 at 3:59 PM John Mayfield 
>> wrote:
>>
>>> How are you running it? We don't create it explicitly but likely in the
>>> depths the JVM to support the font metrics needed to generate a depiction.
>>>
>>> John
>>>
>>> On Thu, 1 Feb 2024 at 15:43, Tim Dudgeon  wrote:
>>>
>>>> I'm seeing a strange issue with a directory named ? (yes, that's a
>>>> question mark) when I run code that uses DepictionGenerator. This seems to
>>>> contain font information.
>>>>
>>>> [image: image.png]
>>>>
>>>> I'm blaming DepictionGenerator because I can't see what else could be
>>>> doing this, but I could of course be wrong.
>>>>
>>>> Any idea why this is happening and suggestions for how to prevent
>>>> it from happening?
>>>> Of course now I know about it I can delete it afterwards, but ...
>>>>
>>>> Thanks
>>>> Tim
>>>>
>>>>
>>>>
>>>> ___
>>>> Cdk-user mailing list
>>>> Cdk-user@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>>
>>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] DepictionGenerator seems to createa dir named ?

2024-02-01 Thread John Mayfield
Are you using gradle? There seems to be lots of reports of gradle create a
"?" directory which happens if you don't have a home.

On Thu, 1 Feb 2024 at 15:58, John Mayfield 
wrote:

> How are you running it? We don't create it explicitly but likely in the
> depths the JVM to support the font metrics needed to generate a depiction.
>
> John
>
> On Thu, 1 Feb 2024 at 15:43, Tim Dudgeon  wrote:
>
>> I'm seeing a strange issue with a directory named ? (yes, that's a
>> question mark) when I run code that uses DepictionGenerator. This seems to
>> contain font information.
>>
>> [image: image.png]
>>
>> I'm blaming DepictionGenerator because I can't see what else could be
>> doing this, but I could of course be wrong.
>>
>> Any idea why this is happening and suggestions for how to prevent it from
>> happening?
>> Of course now I know about it I can delete it afterwards, but ...
>>
>> Thanks
>> Tim
>>
>>
>>
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] DepictionGenerator seems to createa dir named ?

2024-02-01 Thread John Mayfield
How are you running it? We don't create it explicitly but likely in the
depths the JVM to support the font metrics needed to generate a depiction.

John

On Thu, 1 Feb 2024 at 15:43, Tim Dudgeon  wrote:

> I'm seeing a strange issue with a directory named ? (yes, that's a
> question mark) when I run code that uses DepictionGenerator. This seems to
> contain font information.
>
> [image: image.png]
>
> I'm blaming DepictionGenerator because I can't see what else could be
> doing this, but I could of course be wrong.
>
> Any idea why this is happening and suggestions for how to prevent it from
> happening?
> Of course now I know about it I can delete it afterwards, but ...
>
> Thanks
> Tim
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] structure layout alignment

2024-01-19 Thread John Mayfield
Well the code/logic was already in there just not the convenience methods
for a common case :-).

On Fri, 19 Jan 2024 at 13:38, Christoph Steinbeck <
christoph.steinb...@gmail.com> wrote:

> Brilliant :)
> On the wish list for so long.
> Thanks so much!
>
> Kind regards,
>
> Chris
>
>
> > On 19. Jan 2024, at 13:13, John Mayfield 
> wrote:
> >
> > https://github.com/cdk/cdk/pull/1032
> >
> > Possibly some improvements could be made but should be reasonably
> flexible, I would make this the default on CDK depict.
> >
> > Best,
> >
> > On Wed, 17 Jan 2024 at 23:48, Uli Fechner  wrote:
> > Thank you John, as always your answer is much appreciated.
> >
> > I came across this issue some years ago, so I am aware that this is
> anything but simple. I'll certainly be very happy to test any
> implementation and provide feedback :)
> >
> > Uli Fechner
> > Senior Software Developer
> > Pending AI
> >
> >
> > u...@pending.ai
> > https://pending.ai/
> > The National Innovation Centre, Suite 112, 4 Cornwallis St., NSW 2015,
> Australia
> >
> >
> > On Thu, Jan 18, 2024 at 9:17 AM dpoly  wrote:
> > This was a major issue for me too. I was trying to fix a starting config
> (such as a ring or backbone) and show visually the effect of progressively
> adding atoms or bonds or groups. Take a glucose molecule, deform it, add
> another, make sucrose (or fructose). It really doesn’t seem to be set up to
> make that at all easy.
> >  Regards
> > David M Bennett FACSPolyomino Games – Programming Languages and Players
> for Games and Puzzles -- http://www.polyomino.com
> >  From: John Mayfield 
> > Sent: Thursday, 18 January 2024 7:24 AM
> > To: Uli Fechner 
> > Cc: cdkuser 
> > Subject: Re: [Cdk-user] structure layout alignment
> >  To clarify a bit more, CDK already has the APIs to "fix" part of a
> molecule. So you set the coordinates and then "fix" those atoms you can
> generate the rest. However you need to know which atoms to fix and what the
> coordinates should be.
> >  On Wed, 17 Jan 2024 at 20:23, John Mayfield <
> john.wilkinson...@gmail.com> wrote:
> > Hi Uli,
> >  There is an open issue on cdk/depict from Noel. I'll try and take a
> look this week but it's not quite as simple as you might think. For basic
> you would not want to align ring atoms to chain atoms, but also changes in
> hybridisation cause issues (e.g. a cumulene vs alkane might be mapped).
> Likewise you probably only want the largest continuous part aligned, what
> if there are disconnected parts, etc.
> >  Here is how I've done it in the past in our (NextMove's) SmallWorld
> tool: https://gist.github.com/johnmay/b51fd51e2870554afa00ee75f668e91a
> You can see I try and balance how aggressive you want the alignment to me.
> >  On Wed, 17 Jan 2024 at 06:47, Uli Fechner  wrote:
> > Hi,
> >  I would like to individually layout and then render several structures
> that are similar to each other (e.g., share a scaffold).
> >  Is there a way to ensure that these structures are not rotated, that
> is, that they are aligned in terms of their shared structural elements?
> >  And to take this one step further: ideally, I would like to do this
> using CDKDepict using its REST API interface. But that might be a separate
> question altogether.
> >  Any help is appreciated.
> >  Best
> > Uli
> > ___
> > Cdk-user mailing list
> > Cdk-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/cdk-user
> > ___
> > Cdk-user mailing list
> > Cdk-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] structure layout alignment

2024-01-19 Thread John Mayfield
https://github.com/cdk/cdk/pull/1032

Possibly some improvements could be made but should be reasonably flexible,
I would make this the default on CDK depict.

Best,

On Wed, 17 Jan 2024 at 23:48, Uli Fechner  wrote:

> Thank you John, as always your answer is much appreciated.
>
> I came across this issue some years ago, so I am aware that this is
> anything but simple. I'll certainly be very happy to test any
> implementation and provide feedback :)
>
> Uli Fechner
>
> Senior Software Developer
>
> Pending AI
>
>
>
> u...@pending.ai
> https://pending.ai/
> The National Innovation Centre, Suite 112, 4 Cornwallis St., NSW 2015,
> Australia
>
>
> On Thu, Jan 18, 2024 at 9:17 AM dpoly  wrote:
>
>> This was a major issue for me too. I was trying to fix a starting config
>> (such as a ring or backbone) and show visually the effect of progressively
>> adding atoms or bonds or groups. Take a glucose molecule, deform it, add
>> another, make sucrose (or fructose). It really doesn’t seem to be set up to
>> make that at all easy.
>>
>>
>>
>> Regards
>>
>> David M Bennett FACS
>> *--*
>>
>> *Polyomino Games –** Programming Languages and Players for Games and
>> Puzzles **-- http://www.polyomino.com <http://www.polyomino.com/>*
>>
>>
>>
>> *From:* John Mayfield 
>> *Sent:* Thursday, 18 January 2024 7:24 AM
>> *To:* Uli Fechner 
>> *Cc:* cdkuser 
>> *Subject:* Re: [Cdk-user] structure layout alignment
>>
>>
>>
>> To clarify a bit more, CDK already has the APIs to "fix" part of a
>> molecule. So you set the coordinates and then "fix" those atoms you can
>> generate the rest. However you need to know which atoms to fix and what the
>> coordinates should be.
>>
>>
>>
>> On Wed, 17 Jan 2024 at 20:23, John Mayfield 
>> wrote:
>>
>> Hi Uli,
>>
>>
>>
>> There is an open issue on cdk/depict from Noel. I'll try and take a look
>> this week but it's not quite as simple as you might think. For basic you
>> would not want to align ring atoms to chain atoms, but also changes in
>> hybridisation cause issues (e.g. a cumulene vs alkane might be mapped).
>> Likewise you probably only want the largest continuous part aligned, what
>> if there are disconnected parts, etc.
>>
>>
>>
>> Here is how I've done it in the past in our (NextMove's) SmallWorld tool:
>> https://gist.github.com/johnmay/b51fd51e2870554afa00ee75f668e91a
>>
>>
>>
>> You can see I try and balance how aggressive you want the alignment to me.
>>
>>
>>
>> On Wed, 17 Jan 2024 at 06:47, Uli Fechner  wrote:
>>
>> Hi,
>>
>>
>>
>> I would like to individually layout and then render several structures
>> that are similar to each other (e.g., share a scaffold).
>>
>>
>>
>> Is there a way to ensure that these structures are not rotated, that is,
>> that they are aligned in terms of their shared structural elements?
>>
>>
>>
>> And to take this one step further: ideally, I would like to do this using
>> CDKDepict using its REST API interface. But that might be a separate
>> question altogether.
>>
>>
>>
>> Any help is appreciated.
>>
>>
>>
>> Best
>>
>> Uli
>>
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] structure layout alignment

2024-01-17 Thread John Mayfield
To clarify a bit more, CDK already has the APIs to "fix" part of a
molecule. So you set the coordinates and then "fix" those atoms you can
generate the rest. However you need to know which atoms to fix and what the
coordinates should be.

On Wed, 17 Jan 2024 at 20:23, John Mayfield 
wrote:

> Hi Uli,
>
> There is an open issue on cdk/depict from Noel. I'll try and take a look
> this week but it's not quite as simple as you might think. For basic you
> would not want to align ring atoms to chain atoms, but also changes in
> hybridisation cause issues (e.g. a cumulene vs alkane might be mapped).
> Likewise you probably only want the largest continuous part aligned, what
> if there are disconnected parts, etc.
>
> Here is how I've done it in the past in our (NextMove's) SmallWorld tool:
> https://gist.github.com/johnmay/b51fd51e2870554afa00ee75f668e91a
>
> You can see I try and balance how aggressive you want the alignment to me.
>
> On Wed, 17 Jan 2024 at 06:47, Uli Fechner  wrote:
>
>> Hi,
>>
>> I would like to individually layout and then render several structures
>> that are similar to each other (e.g., share a scaffold).
>>
>> Is there a way to ensure that these structures are not rotated, that is,
>> that they are aligned in terms of their shared structural elements?
>>
>> And to take this one step further: ideally, I would like to do this using
>> CDKDepict using its REST API interface. But that might be a separate
>> question altogether.
>>
>> Any help is appreciated.
>>
>> Best
>> Uli
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] structure layout alignment

2024-01-17 Thread John Mayfield
Hi Uli,

There is an open issue on cdk/depict from Noel. I'll try and take a look
this week but it's not quite as simple as you might think. For basic you
would not want to align ring atoms to chain atoms, but also changes in
hybridisation cause issues (e.g. a cumulene vs alkane might be mapped).
Likewise you probably only want the largest continuous part aligned, what
if there are disconnected parts, etc.

Here is how I've done it in the past in our (NextMove's) SmallWorld tool:
https://gist.github.com/johnmay/b51fd51e2870554afa00ee75f668e91a

You can see I try and balance how aggressive you want the alignment to me.

On Wed, 17 Jan 2024 at 06:47, Uli Fechner  wrote:

> Hi,
>
> I would like to individually layout and then render several structures
> that are similar to each other (e.g., share a scaffold).
>
> Is there a way to ensure that these structures are not rotated, that is,
> that they are aligned in terms of their shared structural elements?
>
> And to take this one step further: ideally, I would like to do this using
> CDKDepict using its REST API interface. But that might be a separate
> question altogether.
>
> Any help is appreciated.
>
> Best
> Uli
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] clearing stereochemistry

2024-01-16 Thread John Mayfield
Velusamy is correct. The primary storage of stereochemistry is via Stereo
elements.

The bond display only controls the bond display, your code would do what
you want if you generated the coordinates first and then cleared the bond
display/stereo info but since you didn't have the coordinates from the
SMILES these get automatically computed for you.

Best,
John

On Mon, 15 Jan 2024 at 18:26, Velusamy Velu  wrote:

> The 3rd line below works for me.
>
> final SmilesParser PARSER = new 
> SmilesParser(SilentChemObjectBuilder.getInstance());
> IAtomContainer mol = PARSER.parseSmiles("C[C@@H](C(=O)O)N");
> mol.setStereoElements(new ArrayList<>());
>
> Thanks
>
> Velusamy K. Velu
> 614-323-9649
>   
>   
>
>
> On Mon, Jan 15, 2024 at 11:09 AM Tim Dudgeon 
> wrote:
>
>> What is the best approach to clear stereochemistry from a molecule.
>> I think I was told to use:
>>
>>for (IBond bond : mol.bonds()) {
>> bond.setStereo(IBond.Stereo.NONE);
>> bond.setDisplay(IBond.Display.Solid);
>> }
>>
>> But if I use this for a molecule generated from SMILES it still is
>> depicted with a wedge bond.
>>
>> IAtomContainer mol = smilesParser.parseSmiles("C[C@@H](C(=O)O)N");
>>
>> for (IBond bond : mol.bonds()) {
>> bond.setStereo(IBond.Stereo.NONE);
>> bond.setDisplay(IBond.Display.Solid);
>> }
>> DepictionGenerator dg = new DepictionGenerator().withSize(512, 512)
>> .withAtomColors();
>> dg.depict(mol).writeTo("mol.png");
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Reading V3000 SDF

2024-01-11 Thread John Mayfield
Nothing obvious, sorry not sure can be much more helpful. You could check
out the CDK 1.4.19 tag rebuild and step through where the error occurs.

On Thu, 11 Jan 2024 at 10:35, Tim Dudgeon  wrote:

> Here is the V2000 file.
>
> On Thu, Jan 11, 2024 at 10:13 AM John Mayfield <
> john.wilkinson...@gmail.com> wrote:
>
>> So I think it's here:
>> https://github.com/cdk/cdk/blob/cdk-1.4.19/src/main/org/openscience/cdk/graph/invariant/EquivalentClassPartitioner.java#L398
>>
>> Which is deep in the algorithm rather than in the reader. Can you also
>> send the V2000 that you say works and I'll see if there is anything obvious
>> you can do to make the V3000 work.
>>
>> Testing the current version I don't see the error, the only significant
>> change was in 2013:
>> https://github.com/cdk/cdk/commit/0aa0b794f48cdc057db133eabbcd865775a0730b
>>
>> On Thu, 11 Jan 2024 at 09:53, Tim Dudgeon  wrote:
>>
>>> skip=true doesn't help.
>>>
>>> On Thu, Jan 11, 2024 at 8:56 AM John Mayfield <
>>> john.wilkinson...@gmail.com> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> Why are you forced to use 1.4? I remember I made lots of improvements
>>>> to the SDF reading over a decade ago (1.4 is now 10.5 years old) but these
>>>> would have been in 1.5 onwards. It doesn't look like you're doing anything
>>>> wrong but you could try adding skip=true to your constructor. This means if
>>>> it sees something it doesn't like it continues rather than stops iterating.
>>>>
>>>> Best,
>>>> John
>>>>
>>>> On Wed, 10 Jan 2024 at 16:31, Tim Dudgeon 
>>>> wrote:
>>>>
>>>>> I'm having difficulty reading V3000 SDF files.
>>>>> The IteratingMDLReader docs (
>>>>> https://cdk.github.io/cdk/1.4/docs/api/org/openscience/cdk/io/iterator/IteratingMDLReader.html)
>>>>> seem to suggest that it will read V3000, but maybe it has to be
>>>>> specifically told to use V3000 format (which would be a pain to work out)?
>>>>>
>>>>> I'm using it like this:
>>>>>
>>>>> File sdfFile = new File(file);
>>>>> IteratingMDLReader reader = new IteratingMDLReader(
>>>>> new FileInputStream(sdfFile),
>>>>> DefaultChemObjectBuilder.getInstance()
>>>>> );
>>>>>
>>>>> BTW, I'm forced into using an old 1.4 version for reasons out of my
>>>>> control.
>>>>> ___
>>>>> Cdk-user mailing list
>>>>> Cdk-user@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>>>
>>>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Reading V3000 SDF

2024-01-11 Thread John Mayfield
So I think it's here:
https://github.com/cdk/cdk/blob/cdk-1.4.19/src/main/org/openscience/cdk/graph/invariant/EquivalentClassPartitioner.java#L398

Which is deep in the algorithm rather than in the reader. Can you also send
the V2000 that you say works and I'll see if there is anything obvious you
can do to make the V3000 work.

Testing the current version I don't see the error, the only significant
change was in 2013:
https://github.com/cdk/cdk/commit/0aa0b794f48cdc057db133eabbcd865775a0730b

On Thu, 11 Jan 2024 at 09:53, Tim Dudgeon  wrote:

> skip=true doesn't help.
>
> On Thu, Jan 11, 2024 at 8:56 AM John Mayfield 
> wrote:
>
>> Hi Tim,
>>
>> Why are you forced to use 1.4? I remember I made lots of improvements to
>> the SDF reading over a decade ago (1.4 is now 10.5 years old) but these
>> would have been in 1.5 onwards. It doesn't look like you're doing anything
>> wrong but you could try adding skip=true to your constructor. This means if
>> it sees something it doesn't like it continues rather than stops iterating.
>>
>> Best,
>> John
>>
>> On Wed, 10 Jan 2024 at 16:31, Tim Dudgeon  wrote:
>>
>>> I'm having difficulty reading V3000 SDF files.
>>> The IteratingMDLReader docs (
>>> https://cdk.github.io/cdk/1.4/docs/api/org/openscience/cdk/io/iterator/IteratingMDLReader.html)
>>> seem to suggest that it will read V3000, but maybe it has to be
>>> specifically told to use V3000 format (which would be a pain to work out)?
>>>
>>> I'm using it like this:
>>>
>>> File sdfFile = new File(file);
>>> IteratingMDLReader reader = new IteratingMDLReader(
>>> new FileInputStream(sdfFile),
>>> DefaultChemObjectBuilder.getInstance()
>>> );
>>>
>>> BTW, I'm forced into using an old 1.4 version for reasons out of my
>>> control.
>>> ___
>>> Cdk-user mailing list
>>> Cdk-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Reading V3000 SDF

2024-01-11 Thread John Mayfield
Hi Tim,

Why are you forced to use 1.4? I remember I made lots of improvements to
the SDF reading over a decade ago (1.4 is now 10.5 years old) but these
would have been in 1.5 onwards. It doesn't look like you're doing anything
wrong but you could try adding skip=true to your constructor. This means if
it sees something it doesn't like it continues rather than stops iterating.

Best,
John

On Wed, 10 Jan 2024 at 16:31, Tim Dudgeon  wrote:

> I'm having difficulty reading V3000 SDF files.
> The IteratingMDLReader docs (
> https://cdk.github.io/cdk/1.4/docs/api/org/openscience/cdk/io/iterator/IteratingMDLReader.html)
> seem to suggest that it will read V3000, but maybe it has to be
> specifically told to use V3000 format (which would be a pain to work out)?
>
> I'm using it like this:
>
> File sdfFile = new File(file);
> IteratingMDLReader reader = new IteratingMDLReader(
> new FileInputStream(sdfFile),
> DefaultChemObjectBuilder.getInstance()
> );
>
> BTW, I'm forced into using an old 1.4 version for reasons out of my
> control.
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Generic Symbols

2023-11-07 Thread John Mayfield
In CDK they are called Psuedo atoms. You can manually create from or load
from an input.

In SMILES you can do the following (CDK extension) [R1]. The more
correct way in CXSMILES would be *C |$R1$|.

On Mon, 6 Nov 2023 at 21:42, Velusamy Velu  wrote:

> Hi Friends:
>
> I want to know how the generic symbols like X for halogens, R, R', R'' and
> Me, Et, Pr, Ar and Ph (Methyl, Ethyl, Propyl, Aryls, & Phenyl) etc are
> handled by CDK?
>
> Is there any good documentation in this regard? Your help is much
> appreciated.
>
> Thanks
>
> Velusamy K. Velu
> (614) 323-9649
>   
>   
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Clearing 3D coordinates before depiction

2023-09-23 Thread John Mayfield
Hi Tim,

If you just include the cdk-legacy package you should have the SMSD code.
Or you can use the latest version: https://github.com/asad/SMSD. Note all
the package/class names changed so is not a straight drop-in. That's
actually why we deprecated the CDK version, it was going to be a
non-trivial update either way.

Best,
John

On Sat, 23 Sept 2023 at 11:58, Egon Willighagen 
wrote:

>
> Can you point us (well, John particularly) to the SMSD code you are using?
> I think if he can find the time, he can make some suggestions on how to
> replace it with more modern code.
>
> Egon
>
> On Sat, 23 Sept 2023 at 12:55, Tim Dudgeon  wrote:
>
>> Sorry for the late reply. I've been distracted.
>> Sorry, I meant DepictionGenerator. CDKMolDepict is my class that's
>> running this! Egon found it correctly.
>>
>> Regarding the version, yes, this is old code, and I can't remember all
>> the details.
>> I'm not able to update to the latest as I have a dependency on the
>> deprecated org.openscience.cdk:cdk-smsd module.
>> I did have a discussion some time ago with Egon about removing the
>> dependency on the deprecated code, but we never got that resolved.
>> So this means that I can only update to version 2.5. which I've now done.
>>
>> I still find problems with that version. For some reason that I can't
>> recall I'm using:
>>
>> StructureDiagramGenerator g = new StructureDiagramGenerator();
>> g.generateCoordinates(mol);
>>
>> to do the layout prior to calling DepictionGenerator().depict(mol)
>>
>> This is causing me problems when handling 2D or 3D molfiles.
>> I find I can work around it by doing IAtomContainer -> SMILES ->
>> IAtomContainer, which works for now.
>> I'll need to look more deeply at the code to work out better options.
>> There's a lot going on like MCS alignment and highlighting.
>>
>> Thanks for your help.
>>
>>
>> On Sat, Sep 23, 2023 at 10:45 AM Egon Willighagen <
>> egon.willigha...@gmail.com> wrote:
>>
>>> If you have that discussion here, I can update
>>> https://egonw.github.io/cdkbook/migration.html accordingly.
>>>
>>> Egon
>>>
>>> On Sat, 23 Sept 2023 at 11:43, John Mayfield <
>>> john.wilkinson...@gmail.com> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> It looks like you're using 5 year old CDK 2.2, is that correct?
>>>>
>>>> build.gradle:
>>>> > project.ext.set('cdkVersion', '2.2')
>>>>
>>>> It's likely just updating will fix the issue. It should be relatively
>>>> seamless but let me know if there are any issues and I'll tell you how to
>>>> fix it.
>>>>
>>>> On Sat, 23 Sept 2023 at 06:38, Egon Willighagen <
>>>> egon.willigha...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Tim,
>>>>>
>>>>> I guess you are referring to
>>>>> https://github.com/InformaticsMatters/squonk/blob/master/components/cdk-lib/src/main/groovy/org/squonk/cdk/io/CDKMolDepict.java
>>>>>
>>>>> I also looked at the molfile reading in CDKMoleculeIOUtils and it
>>>>> looks correct to me, but the hydrogen adding may be a conflicting issue
>>>>> here. What is the actual exception (stacktrace) you get?
>>>>>
>>>>> Like John set, removing the 3D coordinates should not be needed (the
>>>>> CDK is designed to allow having both of them in parallel), but if you have
>>>>> to, run .setPoint3d(null) on each atom. That should do the trick.
>>>>>
>>>>> Egon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, 20 Sept 2023 at 12:05, John Mayfield <
>>>>> john.wilkinson...@gmail.com> wrote:
>>>>>
>>>>>> I have no idea what CDKMolDepict is, a Knime thing?
>>>>>>
>>>>>> It's probably just not being updated. Testing the following on
>>>>>> master/main works as expected (DepictionGenerator computes the 2D as
>>>>>> needed, no need to clear the 3D):
>>>>>>
>>>>>> public static void main(String[] args) {
>>>>>> String molfile = "\n" +
>>>>>> "  MJ231200  \n" +
>>>>>> "\n" +
>>>>>> "  5  4  0  0  1  0  0  0  0  0999 V2000\n" +
>&g

Re: [Cdk-user] Clearing 3D coordinates before depiction

2023-09-23 Thread John Mayfield
Hi Tim,

It looks like you're using 5 year old CDK 2.2, is that correct?

build.gradle:
> project.ext.set('cdkVersion', '2.2')

It's likely just updating will fix the issue. It should be relatively
seamless but let me know if there are any issues and I'll tell you how to
fix it.

On Sat, 23 Sept 2023 at 06:38, Egon Willighagen 
wrote:

>
> Tim,
>
> I guess you are referring to
> https://github.com/InformaticsMatters/squonk/blob/master/components/cdk-lib/src/main/groovy/org/squonk/cdk/io/CDKMolDepict.java
>
> I also looked at the molfile reading in CDKMoleculeIOUtils and it looks
> correct to me, but the hydrogen adding may be a conflicting issue here.
> What is the actual exception (stacktrace) you get?
>
> Like John set, removing the 3D coordinates should not be needed (the CDK
> is designed to allow having both of them in parallel), but if you have to,
> run .setPoint3d(null) on each atom. That should do the trick.
>
> Egon
>
>
>
>
>
> On Wed, 20 Sept 2023 at 12:05, John Mayfield 
> wrote:
>
>> I have no idea what CDKMolDepict is, a Knime thing?
>>
>> It's probably just not being updated. Testing the following on
>> master/main works as expected (DepictionGenerator computes the 2D as
>> needed, no need to clear the 3D):
>>
>> public static void main(String[] args) {
>> String molfile = "\n" +
>> "  MJ231200  \n" +
>> "\n" +
>> "  5  4  0  0  1  0  0  0  0  0999 V2000\n" +
>> "0.9718   -0.11390.6193 O   0  0  0  0  0  0  0  0  0
>>  0  0  0\n" +
>> "0.94480.0189   -0.2285 S   0  0  0  0  0  0  0  0  0
>>  0  0  0\n" +
>> "   -0.00420.1584   -0.4371 C   0  0  0  0  0  0  0  0  0
>>  0  0  0\n" +
>> "   -0.28820.8589   -0.1053 C   0  0  0  0  0  0  0  0  0
>>  0  0  0\n" +
>> "1.1863   -0.8418   -0.6272 C   0  0  0  0  0  0  0  0  0
>>  0  0  0\n" +
>> "  1  2  2  0  0  0  0\n" +
>> "  2  3  1  0  0  0  0\n" +
>> "  3  4  1  0  0  0  0\n" +
>> "  2  5  1  0  0  0  0\n" +
>> "M  END\n";
>> IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
>> try (MDLV2000Reader mdlr = new MDLV2000Reader(new
>> StringReader(molfile))) {
>> IAtomContainer mol = mdlr.read(bldr.newAtomContainer());
>> new DepictionGenerator().depict(mol).writeTo("/tmp/tmp.svg");
>> } catch (IOException e) {
>> throw new RuntimeException(e);
>> } catch (CDKException e) {
>> throw new RuntimeException(e);
>> }
>> }
>>
>> On Tue, 19 Sept 2023 at 18:15, Tim Dudgeon  wrote:
>>
>>> I'm using StructureDiagramGenerator.generateCoordinates() to create a
>>> layout before using CDKMolDepict to generate images for molecules, and I've
>>> hit a problem when using molecules that already have 3D coordinates as
>>> StructureDiagramGenerator.generateCoordinates() does not generate a new 2D
>>> layout if 3D coordinates are present, and passing in a molecule with 3D
>>> coordinates seems to make CDKMolDepict crash badly.
>>>
>>> So, what is the best way to clear the 3D coordinates for a molecule, so
>>> that StructureDiagramGenerator.generateCoordinates() can do its job?
>>> ___
>>> Cdk-user mailing list
>>> Cdk-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>
>
> --
> Inherited disorders can be hard to interpret when multiple biomarkers are
> involved. A network approach can help bring insight:
> https://doi.org/10.1186/s13023-023-02683-9
>
> --
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Blog: https://chem-bla-ics.blogspot.com/
> Mastodon: https://scholar.social/@egonw
> PubList: https://orcid.org/-0001-7542-0286
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Clearing 3D coordinates before depiction

2023-09-20 Thread John Mayfield
I have no idea what CDKMolDepict is, a Knime thing?

It's probably just not being updated. Testing the following on master/main
works as expected (DepictionGenerator computes the 2D as needed, no need to
clear the 3D):

public static void main(String[] args) {
String molfile = "\n" +
"  MJ231200  \n" +
"\n" +
"  5  4  0  0  1  0  0  0  0  0999 V2000\n" +
"0.9718   -0.11390.6193 O   0  0  0  0  0  0  0  0  0
 0  0  0\n" +
"0.94480.0189   -0.2285 S   0  0  0  0  0  0  0  0  0
 0  0  0\n" +
"   -0.00420.1584   -0.4371 C   0  0  0  0  0  0  0  0  0
 0  0  0\n" +
"   -0.28820.8589   -0.1053 C   0  0  0  0  0  0  0  0  0
 0  0  0\n" +
"1.1863   -0.8418   -0.6272 C   0  0  0  0  0  0  0  0  0
 0  0  0\n" +
"  1  2  2  0  0  0  0\n" +
"  2  3  1  0  0  0  0\n" +
"  3  4  1  0  0  0  0\n" +
"  2  5  1  0  0  0  0\n" +
"M  END\n";
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
try (MDLV2000Reader mdlr = new MDLV2000Reader(new
StringReader(molfile))) {
IAtomContainer mol = mdlr.read(bldr.newAtomContainer());
new DepictionGenerator().depict(mol).writeTo("/tmp/tmp.svg");
} catch (IOException e) {
throw new RuntimeException(e);
} catch (CDKException e) {
throw new RuntimeException(e);
}
}

On Tue, 19 Sept 2023 at 18:15, Tim Dudgeon  wrote:

> I'm using StructureDiagramGenerator.generateCoordinates() to create a
> layout before using CDKMolDepict to generate images for molecules, and I've
> hit a problem when using molecules that already have 3D coordinates as
> StructureDiagramGenerator.generateCoordinates() does not generate a new 2D
> layout if 3D coordinates are present, and passing in a molecule with 3D
> coordinates seems to make CDKMolDepict crash badly.
>
> So, what is the best way to clear the 3D coordinates for a molecule, so
> that StructureDiagramGenerator.generateCoordinates() can do its job?
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Extended Fingerprint: what do the features represent?

2023-08-28 Thread John Mayfield
The features represent subgraphs of the input molecule. For a binary
fingerprint they are hashed (ireversally) and there is a many-to-one
mapping between the subgraph and the hash (i.e. the value 14 you see). We
do not currently provide a general way to see what features hash to which
values but some fingerprints have an option to generate the features
"unfolded". We should add this option in more places since it can be useful.

Best Wishes,
John

On Mon, 28 Aug 2023 at 19:06, Chong Kim San Allen via Cdk-user <
cdk-user@lists.sourceforge.net> wrote:

> Dear Helpdesk,
>
>
>
> I have used CDK to generate the Extended Fingerprints for a couple of
> compounds and I found that certain features are common among my compounds.
> For example, “14” keeps showing up. I would like to know what is “14”? I
> know that the default path length is 7 so I was wondering if the feature is
> a chemical substructure? The default size for Extended Fingerprint is 1024
> so I was wondering if there is a way to figure out what each of the 1024
> features represents.
>
>
>
> Similarly, if I generated ECFP6 which has 2^32 features (count version),
> is there a way for me to figure out what each of those features are? If a
> feature appears to have a high count and I wanted to figure out what this
> feature was, is there a command I can use to find out what that feature
> represents?
>
>
>
> Thanks in advance for your help.
>
>
>
> Best,
> Allen
>
>
> --
>
> CONFIDENTIALITY: This email is intended solely for the person(s) named and
> may be confidential and/or privileged. If you are not the intended
> recipient, please delete it, notify us and do not copy, use, or disclose
> its contents.
> Towards a sustainable earth: Print only when necessary. Thank you.
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] 10:56,Null elements in atom/bond arrays

2023-07-07 Thread John Mayfield
It's a trade-off, because most molecules will use that space and if you add
things on you don't want to keep add things on. Memory works in cache lines
and pages so trying to save a few bytes compacting the rare cases is not
worth.

The default in Java collections is 10 or so I believe (see
https://www.baeldung.com/java-list-capacity-array-size#:~:text=Technically%2C%20the%20default%20capacity%20(DEFAULT_CAPACITY,is%20added%20to%20the%20list.),
it turns out the avg drug like molecule is 26 atoms/bonds or so.

On Fri, 7 Jul 2023 at 10:39, FB  wrote:

> Hi,
>
> I noticed that small molecules like methane or ethane have zero elements
> in their bonds or atomic arrangements. I did some quick research and saw
> that the arrays in AtomContainer2 are expanded to a default size of 20.
> I wondered why the arrays are not trimmed after successfully creating an
> AtomContainer?
> I am just asking out of interest, as we have discussed this point in our
> working group.
>
> Best regards,
> Felix
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] discrepancy between CDK results and fingerprint extracted directly from pubchem website

2023-04-19 Thread John Mayfield
Hi Yihan,

Could you open an issue on GitHub, there are some small changes we could
make to match more closely.

The PubChem CACTVS fingerprint implementation is private and so it's not
possible to match exactly based on code. However it should be "relatively"
close to what has been documented:
https://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

For some reason (we should change that) you need to make hydrogens explicit
for the fingerprint:

SmilesParser smipar = new
> SmilesParser(SilentChemObjectBuilder.getInstance());
> IAtomContainer mol =
> smipar.parseSmiles("CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=CC=C3)N)C(=O)O)C");
> AtomContainerManipulator.convertImplicitToExplicitHydrogens(mol);
> BitSet fp = new
> PubchemFingerprinter(SilentChemObjectBuilder.getInstance()).getBitFingerprint(mol)
>
>  .asBitSet();
> System.out.println(fp);
>

That takes care of (0,1,2):

{0, 1, 2, 9, 10, 11, 12, 14, 15, 18, 19, 20, 33, 129, 131, 132, 143, 145,
146, 178, 179, 255, 283, 284, 285, 286, 293, 299, 308, 332, 333, 338, 340,
344, 345, 349, 351, 352, 353, 355, 356, 365, 368, 370, 371, 374, 380, 384,
390, 391, 392, 393, 406, 412, 416, 420, 430, 434, 439, 440, 441, 443, 446,
451, 452, 464, 470, 489, 490, 507, 516, 520, 524, 528, 535, 536, 540, 549,
552, 556, 564, 566, 569, 570, 578, 579, 580, 582, 584, 586, 592, 595, 597,
599, 602, 603, 607, 608, 611, 613, 617, 618, 633, 634, 637, 640, 643, 645,
646, 656, 658, 659, 660, 664, 668, 677, 678, 679, 683, 684, 688, 692, 696,
704, 708, 709, 710}

The other different bits are to do with which ringset we use:

213 >= 1 any ring size 7
215 >= 1 saturated or aromatic nitrogen-containing ring size 7
216 >= 1 saturated or aromatic heteroatom-containing ring size 7

IIRC PubChem/CACTVS substructure keys use a different cycle definition
(based on shortest cycle through triples i.e. atom-bond-atom-bond-atom)
rather than SSSR/MCB. We didn't have the option to find these when the
fingerprint was first written but we do now. We can make this small change:

PubChemFingerprint.java:
public CountRings(IAtomContainer m) {
// ringSet = Cycles.sssr(m).toRingSet(); // wrong
ringSet = Cycles.tripletShort(m).toRingSet();
}

and get the expected bits set:

{0, 1, 2, 9, 10, 11, 12, 14, 15, 18, 19, 20, 33, 129, 131, 132, 143, 145,
146, 178, 179, 213, 215, 216, 255, 283, 284, 285, 286, 293, 299, 308, 332,
333, 338, 340, 344, 345, 349, 351, 352, 353, 355, 356, 365, 368, 370, 371,
374, 380, 384, 390, 391, 392, 393, 406, 412, 416, 420, 430, 434, 439, 440,
441, 443, 446, 451, 452, 464, 470, 489, 490, 507, 516, 520, 524, 528, 535,
536, 540, 549, 552, 556, 564, 566, 569, 570, 578, 579, 580, 582, 584, 586,
592, 595, 597, 599, 602, 603, 607, 608, 611, 613, 617, 618, 633, 634, 637,
640, 643, 645, 646, 656, 658, 659, 660, 664, 668, 677, 678, 679, 683, 684,
688, 692, 696, 704, 708, 709, 710}

I suggest we add an option to the fingerprint use the correct ring set, but
we should also check for other discrepancies in PubChem (i.e. please open a
GitHub issue).

John

On Wed, 19 Apr 2023 at 09:09, Yihan Wu  wrote:

> Hi,
>
> I've come across a discrepancy between the pubchem fingerprint obtained
> through CDK (calculated from SMILES) and the pubchem fingerprint extracted
> directly from the pubchem website. For example, the Canonical SMILES of
>  compound Ampicillin (pubchem CID 6249) is
> CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=CC=C3)N)C(=O)O)C.
> The calculation of pubchem fingerprint based on this SMILES by CDK is
>
> 001100111100010110010110001100010000010110001110100011000101110111001011001010001010100010001010001111010011100010100111100010001000100110001100100010001010011000111010101010010101001100011001010001100110010010010110101110001000111100011000100010001000100011100
> The pubchem fingerprint extracted from pubchem website for this compound is
>
> 

Re: [Cdk-user] Getting warning from AminoAcidCountDescriptor

2022-11-10 Thread John Mayfield
Hi Staffan,

Can you to open a GitHub issue, thanks. It's likely harmless but should be
silent so something we can tweak.

John

On Thu, 10 Nov 2022 at 12:39, Staffan Arvidsson McShane <
staffan.arvids...@gmail.com> wrote:

> Hi,
>
> I'm getting the following warning every time I'm instantiating the
> AminoAcidCountDescriptor
> class:
> org.openscience.cdk.io.cml.CMLHandler WARN: Detected unknown convention:
> cdk:substructureList
>
> I'm not a chemist so I cannot tell if it actually outputs the correct
> descriptor results, but at least it outputs some values when I send in a
> molecule with some amino acids. Should I be concerned with the error
> message, I assume there is some bug in there somewhere? Tested in both
> version 2.7.1 and 2.8.
>
> Best,
> Staffan
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


[Cdk-user] Help needed, NPM WebPack v3 => v5 migration

2022-11-08 Thread John Mayfield
Dear CDK Users,

If you are familiar with NPM and WebPack we desperately need help upgrading
the CDK website to a more modern version to avoid the constant
dependency-bot security alerts. Most of these are torralable since they are
in dev only and not the hosted website but the longer we wait the harder
it get.

I have myself done a WebPack v4 => v5 migration on other projects before
which was not too bad but it took a lot of time and I don't have time to to
figure this out for CDK. Perhaps v3 => v4 then v4 => v5 is the correct
path, but the tie in with bootstrap makes this a pain.

If no one can help I am considering dropping all the bootstrap complexities
and switching back to a much simpler static website. The JS/NPM package
management is just such a mess.

Many Thanks,
John
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] support for RXN and RDF

2022-11-03 Thread John Mayfield
I think there is one in BEAM already, turns out to be quite handy when
parsing text data a character at a time. Basically as you can see from the
API is provides utilities to peek ahead and move ahead if the next chars
are a given substring.



On Thu, 3 Nov 2022 at 06:46, Uli Fechner  wrote:

> Finally found some time to have a go at this.
>
> References to the class CharIter pop up quite a bit in RdfReader. Do
> you have an implementation of this class that you can provide?
> Alternatively, the functionality can probably be guessed from the
> method names and then replicated, but I'd prefer to just take whatever
> you can provide.
>
>
> Uli Fechner
>
> Senior Software Developer
>
> Pending AI
>
>
>
>
> u...@pending.ai
> https://pending.ai/
> The National Innovation Centre, Suite 112, 4 Cornwallis St., NSW 2015,
> Australia
>
>
> On Fri, Oct 21, 2022 at 12:16 AM John Mayfield
>  wrote:
> >
> > Sure... it's currently intended to be Toolkit agnostic, Rdf (the CTfile
> kind not the triple kind) is kind of a meta-format.
> >
> > John
> >
> > On Thu, 20 Oct 2022 at 12:30, Uli Fechner  wrote:
> >>
> >> I have to check what our priorities are at the moment but if I find the
> time would you be okay with me merging this in CDK?
> >>
> >> On Thu, Oct 20, 2022 at 7:30 PM John Mayfield <
> john.wilkinson...@gmail.com> wrote:
> >>>
> >>> Correct, my plan was to add it when I rewrote the MDL/CTfile stack
> which is needed. This is a v. large effort though and the danger in add the
> RDF support is I would just redo it in future. However since the format is
> quite trival it's possible to have some local code to do this (I did the
> same with RInChI).
> >>>
> >>> I may merge in the attached but I have very little time at the moment.
> >>>
> >>> On Thu, 20 Oct 2022 at 07:44, Uli Fechner  wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I had a good look at the code but would appreciate it if you confirm
> what my assessment is:
> >>>>
> >>>> 1) CDK has support for reading RXN V2000 and RXN V3000 and support
> for writing RXN V2000.
> >>>>
> >>>> 2) There is no support for RDF at the moment, neither for reading nor
> for writing.
> >>>>
> >>>> Is that correct? Are there any limitations or known issues when it
> comes to the file formats mentioned above?
> >>>>
> >>>> Thanks.
> >>>> Uli
> >>>> ___
> >>>> Cdk-user mailing list
> >>>> Cdk-user@lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>


CharIter.java
Description: Binary data
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] support for RXN and RDF

2022-10-20 Thread John Mayfield
Sure... it's currently intended to be Toolkit agnostic, Rdf (the CTfile
kind not the triple kind) is kind of a meta-format.

John

On Thu, 20 Oct 2022 at 12:30, Uli Fechner  wrote:

> I have to check what our priorities are at the moment but if I find the
> time would you be okay with me merging this in CDK?
>
> On Thu, Oct 20, 2022 at 7:30 PM John Mayfield 
> wrote:
>
>> Correct, my plan was to add it when I rewrote the MDL/CTfile stack which
>> is needed. This is a v. large effort though and the danger in add the RDF
>> support is I would just redo it in future. However since the format is
>> quite trival it's possible to have some local code to do this (I did the
>> same with RInChI).
>>
>> I may merge in the attached but I have very little time at the moment.
>>
>> On Thu, 20 Oct 2022 at 07:44, Uli Fechner  wrote:
>>
>>> Hi,
>>>
>>> I had a good look at the code but would appreciate it if you confirm
>>> what my assessment is:
>>>
>>> 1) CDK has support for reading RXN V2000 and RXN V3000 and support for
>>> writing RXN V2000.
>>>
>>> 2) There is no support for RDF at the moment, neither for reading nor
>>> for writing.
>>>
>>> Is that correct? Are there any limitations or known issues when it comes
>>> to the file formats mentioned above?
>>>
>>> Thanks.
>>> Uli
>>> ___
>>> Cdk-user mailing list
>>> Cdk-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] atom to atom mapping for reactions

2022-10-20 Thread John Mayfield
Depends on the data, journal reactions are more esoteric. People like to
try and eval/test atom mapping with hard/rare reactions. Common reaction
mechanisms used day-to-day are common because they're useful building
blocks :-).

Something I always thought it (IBM RxnMapper) would benefit from is some
basic checks, it actually gives multiple results with a probability. But
some basic empirical calculations could be used to clean things up - e.g.
number of C-C bonds broken, bonds formed etc.

At NextMove we have a "amap bench" utility (I think from 2013) which tells
us how good a mapping is. I still need to integrate this to remove some
dubious ones from Indigo but my view is supplementing the RXNMapper with
this would be good.

https://nextmovesoftware.com/posters/Sayle_HazELNutExtractReactionsELN_ACS_201309.pdf
https://nextmovesoftware.com/talks/Sayle_ReactionProcessing_Sheffield_201307.pdf

On Thu, 20 Oct 2022 at 10:52, Uli Fechner  wrote:

> Okay, thank you.
>
> We do use RXNMapper and in our experience the numbers they state in their
> publication are a bit optimistic...
>
> On Thu, Oct 20, 2022 at 7:25 PM John Mayfield 
> wrote:
>
>> It does not, personally today I would use the IBM AI based one.
>>
>> On Thu, 20 Oct 2022 at 07:05, Uli Fechner  wrote:
>>
>>> Hi,
>>>
>>> I vaguely remember that there is/was code in the CDK for atom-to-atom
>>> mapping of reactions (wasn't that RDTool?). However, I cannot find that now.
>>>
>>> RDTool by Asad uses CDK. But its maintenance seems to be lacking a bit
>>> (last pre-release based on CDK 2.5 is back from Mar 2021).
>>>
>>> Does CDK provide atom-to-atom mapping for reactions at the moment? Are
>>> there any open-source tools out there other than RDTool that offer that
>>> functionality and use CDK as a library?
>>>
>>> Any help is appreciated.
>>> Uli
>>>
>>> ___
>>> Cdk-user mailing list
>>> Cdk-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] support for RXN and RDF

2022-10-20 Thread John Mayfield
Correct, my plan was to add it when I rewrote the MDL/CTfile stack which is
needed. This is a v. large effort though and the danger in add the RDF
support is I would just redo it in future. However since the format is
quite trival it's possible to have some local code to do this (I did the
same with RInChI).

I may merge in the attached but I have very little time at the moment.

On Thu, 20 Oct 2022 at 07:44, Uli Fechner  wrote:

> Hi,
>
> I had a good look at the code but would appreciate it if you confirm what
> my assessment is:
>
> 1) CDK has support for reading RXN V2000 and RXN V3000 and support for
> writing RXN V2000.
>
> 2) There is no support for RDF at the moment, neither for reading nor for
> writing.
>
> Is that correct? Are there any limitations or known issues when it comes
> to the file formats mentioned above?
>
> Thanks.
> Uli
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>


RdfRecord.java
Description: Binary data


RdfReader.java
Description: Binary data
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] atom to atom mapping for reactions

2022-10-20 Thread John Mayfield
It does not, personally today I would use the IBM AI based one.

On Thu, 20 Oct 2022 at 07:05, Uli Fechner  wrote:

> Hi,
>
> I vaguely remember that there is/was code in the CDK for atom-to-atom
> mapping of reactions (wasn't that RDTool?). However, I cannot find that now.
>
> RDTool by Asad uses CDK. But its maintenance seems to be lacking a bit
> (last pre-release based on CDK 2.5 is back from Mar 2021).
>
> Does CDK provide atom-to-atom mapping for reactions at the moment? Are
> there any open-source tools out there other than RDTool that offer that
> functionality and use CDK as a library?
>
> Any help is appreciated.
> Uli
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


[Cdk-user] New SNAPSHOTS repo

2022-09-19 Thread John Mayfield
Hi All,

If you use the CDK snapshot builds on sonatype repo, we've moved to a new
host to speed up deployments.

https://s01.oss.sonatype.org/content/repositories/snapshots/org/openscience/cdk/

The new config would be


https://s01.oss.sonatype.org/content/repositories/snapshots/

true


false



The old URL is still working for the moment but you will need to migrate to
use the 2.9-SNAPSHOT builds.

John
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] NoSuchAtomException when switching to AtomContainer2

2022-09-05 Thread John Mayfield
Yes Java doesn't the bytecode/JVM does :-). Here is the setup to show it
would break without recompilation downstream:

[john@sentinel:test]% cat Upstream_void.java
> public class Upstream {
> static void Hello() {
> System.out.println("Hello (void)!");
> }
> }
>


> [john@sentinel:test]% cat Upstream_String.java
> public class Upstream {
> static String Hello() {
> System.out.println("Hello (String)!");
> return "Why oh why!?!";
> }
> }
>


> [john@sentinel:test]% cat Main.java
> public class Main {
> public static void main(String[] args) {
> Upstream.Hello();
> }
> }


We compile/run this like this:

 [john@sentinel:test]% cp Upstream_void.java Upstream.java && javac
> Upstream.java Main.java && java -cp . Main
> Hello (void)!


Now if I try it with Upstream_String instead and *DON'T* recompile
Main.java you should the following error:

 [john@sentinel:test]% cp Upstream_String.java Upstream.java && javac
> Upstream.java && java -cp . Main
> Exception in thread "main" java.lang.NoSuchMethodError: 'void
> Upstream.Hello()'
> at Main.main(Main.java:3)


If you recompile Main.java it's OK again.

[john@sentinel:test]% cp Upstream_String.java Upstream.java && javac
> Upstream.java Main.java && java -cp . Main
> Hello (String)!
>

J

On Mon, 5 Sept 2022 at 15:12, Uli Fechner  wrote:

> To the best of my knowledge the return type is not part of the method
> signature as such in Java.
>
> You might think of covariant return types which come into play when
> overriding a method in a child class, and the child’s return type can be a
> subtype of the parent’s return type. This was introduced in Java 5. If the
> parent's method return type is void, however, you won't be able to change
> the return type of the child's method.
>
> On Tue, Sep 6, 2022 at 12:04 AM John Mayfield 
> wrote:
>
>> I did contemplate making the addAtom() return the new "ref" the container
>> has... however I think this is technically an API breakage (void => IAtom
>> return type). This would not have been in prior Java versions but at some
>> point they added the return type to the signature. It's kind of a minor
>> breakage, basically providing the downstream gets recompiled byte code it's
>> OK.
>>
>> On Mon, 5 Sept 2022 at 14:59, John Mayfield 
>> wrote:
>>
>>> Yup! Sorry just drafting stuff out :-) Made the same mistake in the PR
>>> which used predicates:
>>> https://github.com/cdk/cdk/pull/889/commits/a160def65f79218a410c8fa4fe6ece5e2ed40dde
>>>
>>>
>>> On Mon, 5 Sept 2022 at 14:47, Daniel Katzel  wrote:
>>>
>>>> Surely that line output.getAtom( atom.getAtomicNumber() -1 )
>>>>
>>>> Was meant to use output.getAtomCount() -1
>>>>
>>>> To get the last atom added?
>>>>
>>>> On Mon, Sep 5, 2022, 6:44 AM John Mayfield 
>>>> wrote:
>>>>
>>>>> Yes and it's correct, your code is adding bonds to atoms which don't
>>>>> exist yet! You need to add the atoms of the bond before the bond - 0..k 
>>>>> not
>>>>> k .. n.
>>>>>
>>>>> As an aside the Mappings API has this method
>>>>> already (toSubstructureStream()) if you're coming via CDK substructure
>>>>> Pattern (although it doesn't do the radials/lone pairs... possible but I
>>>>> never found a use to have them explicitly like that).
>>>>>
>>>>> In general this code is an inefficient way to build the container... I
>>>>> think it's O(N^3) - but perhaps only O(N^2) with AtomContainer2 :-). Much
>>>>> better to add all the atoms, record these in a set/map, then loop all the
>>>>> bonds. Optionally for best performance you should do this "resync"
>>>>> operation where you map the source => target AtomRef. Maybe we should add
>>>>> this to the ACmanipulator, I thought there was something similar
>>>>> already though.
>>>>>
>>>>> John
>>>>>
>>>>> public static IAtomContainer extractSubstructure(IAtomContainer source, 
>>>>> List atoms) {
>>>>> IAtomContainer output = source.getBuilder().newAtomContainer();
>>>>> Map remap = new HashMap<>();
>>>>> for (IAtom atom : atoms) {
>>>>> output.addAtom(atom);
>>>>> 
>>>>> source.getConne

Re: [Cdk-user] NoSuchAtomException when switching to AtomContainer2

2022-09-05 Thread John Mayfield
I did contemplate making the addAtom() return the new "ref" the container
has... however I think this is technically an API breakage (void => IAtom
return type). This would not have been in prior Java versions but at some
point they added the return type to the signature. It's kind of a minor
breakage, basically providing the downstream gets recompiled byte code it's
OK.

On Mon, 5 Sept 2022 at 14:59, John Mayfield 
wrote:

> Yup! Sorry just drafting stuff out :-) Made the same mistake in the PR
> which used predicates:
> https://github.com/cdk/cdk/pull/889/commits/a160def65f79218a410c8fa4fe6ece5e2ed40dde
>
>
> On Mon, 5 Sept 2022 at 14:47, Daniel Katzel  wrote:
>
>> Surely that line output.getAtom( atom.getAtomicNumber() -1 )
>>
>> Was meant to use output.getAtomCount() -1
>>
>> To get the last atom added?
>>
>> On Mon, Sep 5, 2022, 6:44 AM John Mayfield 
>> wrote:
>>
>>> Yes and it's correct, your code is adding bonds to atoms which don't
>>> exist yet! You need to add the atoms of the bond before the bond - 0..k not
>>> k .. n.
>>>
>>> As an aside the Mappings API has this method
>>> already (toSubstructureStream()) if you're coming via CDK substructure
>>> Pattern (although it doesn't do the radials/lone pairs... possible but I
>>> never found a use to have them explicitly like that).
>>>
>>> In general this code is an inefficient way to build the container... I
>>> think it's O(N^3) - but perhaps only O(N^2) with AtomContainer2 :-). Much
>>> better to add all the atoms, record these in a set/map, then loop all the
>>> bonds. Optionally for best performance you should do this "resync"
>>> operation where you map the source => target AtomRef. Maybe we should add
>>> this to the ACmanipulator, I thought there was something similar
>>> already though.
>>>
>>> John
>>>
>>> public static IAtomContainer extractSubstructure(IAtomContainer source, 
>>> List atoms) {
>>> IAtomContainer output = source.getBuilder().newAtomContainer();
>>> Map remap = new HashMap<>();
>>> for (IAtom atom : atoms) {
>>> output.addAtom(atom);
>>> source.getConnectedLonePairsList(atom).forEach(output::addLonePair);
>>> 
>>> source.getConnectedSingleElectronsList(atom).forEach(output::addSingleElectron);
>>> // resync: get the AtomRef in the context of the new container. This
>>> // presumes atoms gets added at last position which is currently
>>> // always the case
>>> remap.put(atom, output.getAtom(atom.getAtomicNumber() - 1));
>>> }
>>> for (IBond bond : source.bonds()) {
>>> IAtom beg = remap.get(bond.getBegin());
>>> IAtom end = remap.get(bond.getEnd());
>>> if (beg != null && end != null) {
>>> output.addBond(bond);
>>>
>>> // or the more efficient but you get a "NEW" bond so may need 
>>> to some fudging with
>>> // setting aromatcity/ring membership etc, however if you're 
>>> selecting a substructure
>>> // you MUST recalculate these anyways.
>>> // output.addBond(beg.getIndex(), end.getIndex(), 
>>> bond.getOrder(), bond.getStereo());
>>> }
>>> }
>>> return output;
>>> }
>>>
>>>
>>> On Mon, 5 Sept 2022 at 11:10, Uli Fechner  wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I get a NoSuchAtomException when executing the following code (that I
>>>> inherited):
>>>>
>>>> public static IAtomContainer extractSubstructure(IAtomContainer source, 
>>>> List atoms) {
>>>> IAtomContainer output = 
>>>> SilentChemObjectBuilder.getInstance().newAtomContainer();
>>>> int k = 0;
>>>> for (IAtom atom : atoms) {
>>>> output.addAtom(atom);
>>>> source.getConnectedLonePairsList(atom).forEach(lp -> 
>>>> output.addLonePair(lp));
>>>> source.getConnectedSingleElectronsList(atom).forEach(se -> 
>>>> output.addSingleElectron(se));
>>>> k++;
>>>> for (int i = k; i < atoms.size(); i++) {
>>>> IBond bond = source.getBond(atom, atoms.get(i));
>>>> if (bond != null) {
>>>> output.addBond(bond);
>>>> 

Re: [Cdk-user] NoSuchAtomException when switching to AtomContainer2

2022-09-05 Thread John Mayfield
Yup! Sorry just drafting stuff out :-) Made the same mistake in the PR
which used predicates:
https://github.com/cdk/cdk/pull/889/commits/a160def65f79218a410c8fa4fe6ece5e2ed40dde


On Mon, 5 Sept 2022 at 14:47, Daniel Katzel  wrote:

> Surely that line output.getAtom( atom.getAtomicNumber() -1 )
>
> Was meant to use output.getAtomCount() -1
>
> To get the last atom added?
>
> On Mon, Sep 5, 2022, 6:44 AM John Mayfield 
> wrote:
>
>> Yes and it's correct, your code is adding bonds to atoms which don't
>> exist yet! You need to add the atoms of the bond before the bond - 0..k not
>> k .. n.
>>
>> As an aside the Mappings API has this method
>> already (toSubstructureStream()) if you're coming via CDK substructure
>> Pattern (although it doesn't do the radials/lone pairs... possible but I
>> never found a use to have them explicitly like that).
>>
>> In general this code is an inefficient way to build the container... I
>> think it's O(N^3) - but perhaps only O(N^2) with AtomContainer2 :-). Much
>> better to add all the atoms, record these in a set/map, then loop all the
>> bonds. Optionally for best performance you should do this "resync"
>> operation where you map the source => target AtomRef. Maybe we should add
>> this to the ACmanipulator, I thought there was something similar
>> already though.
>>
>> John
>>
>> public static IAtomContainer extractSubstructure(IAtomContainer source, 
>> List atoms) {
>> IAtomContainer output = source.getBuilder().newAtomContainer();
>> Map remap = new HashMap<>();
>> for (IAtom atom : atoms) {
>> output.addAtom(atom);
>> source.getConnectedLonePairsList(atom).forEach(output::addLonePair);
>> 
>> source.getConnectedSingleElectronsList(atom).forEach(output::addSingleElectron);
>> // resync: get the AtomRef in the context of the new container. This
>> // presumes atoms gets added at last position which is currently
>> // always the case
>> remap.put(atom, output.getAtom(atom.getAtomicNumber() - 1));
>> }
>> for (IBond bond : source.bonds()) {
>> IAtom beg = remap.get(bond.getBegin());
>> IAtom end = remap.get(bond.getEnd());
>> if (beg != null && end != null) {
>> output.addBond(bond);
>>
>> // or the more efficient but you get a "NEW" bond so may need to 
>> some fudging with
>> // setting aromatcity/ring membership etc, however if you're 
>> selecting a substructure
>> // you MUST recalculate these anyways.
>> // output.addBond(beg.getIndex(), end.getIndex(), 
>> bond.getOrder(), bond.getStereo());
>> }
>> }
>> return output;
>> }
>>
>>
>> On Mon, 5 Sept 2022 at 11:10, Uli Fechner  wrote:
>>
>>>
>>> Hi,
>>>
>>> I get a NoSuchAtomException when executing the following code (that I
>>> inherited):
>>>
>>> public static IAtomContainer extractSubstructure(IAtomContainer source, 
>>> List atoms) {
>>> IAtomContainer output = 
>>> SilentChemObjectBuilder.getInstance().newAtomContainer();
>>> int k = 0;
>>> for (IAtom atom : atoms) {
>>> output.addAtom(atom);
>>> source.getConnectedLonePairsList(atom).forEach(lp -> 
>>> output.addLonePair(lp));
>>> source.getConnectedSingleElectronsList(atom).forEach(se -> 
>>> output.addSingleElectron(se));
>>> k++;
>>> for (int i = k; i < atoms.size(); i++) {
>>> IBond bond = source.getBond(atom, atoms.get(i));
>>> if (bond != null) {
>>> output.addBond(bond);
>>> }
>>> }
>>> }
>>> return output;
>>> }
>>>
>>> The stack trace starts at the line with
>>> output.addBond(bond);
>>>
>>> and continues with the following lines:
>>>
>>> org.openscience.cdk.exception.NoSuchAtomException: Atom is not a member of 
>>> this AtomContainer
>>> at 
>>> app//org.openscience.cdk.silent.AtomContainer2.getAtomRef(AtomContainer2.java:185)
>>> at 
>>> app//org.openscience.cdk.silent.AtomContainer2.newBondRef(AtomContainer2.java:221)
>>> at 
>>> app//org.openscience.cdk.silent.AtomContainer2.addBond(AtomContainer2.java:908)
>>>
>>> The exception *isn't* thrown if I replace the instantiation

Re: [Cdk-user] NoSuchAtomException when switching to AtomContainer2

2022-09-05 Thread John Mayfield
Yes and it's correct, your code is adding bonds to atoms which don't exist
yet! You need to add the atoms of the bond before the bond - 0..k not k ..
n.

As an aside the Mappings API has this method
already (toSubstructureStream()) if you're coming via CDK substructure
Pattern (although it doesn't do the radials/lone pairs... possible but I
never found a use to have them explicitly like that).

In general this code is an inefficient way to build the container... I
think it's O(N^3) - but perhaps only O(N^2) with AtomContainer2 :-). Much
better to add all the atoms, record these in a set/map, then loop all the
bonds. Optionally for best performance you should do this "resync"
operation where you map the source => target AtomRef. Maybe we should add
this to the ACmanipulator, I thought there was something similar
already though.

John

public static IAtomContainer extractSubstructure(IAtomContainer
source, List atoms) {
IAtomContainer output = source.getBuilder().newAtomContainer();
Map remap = new HashMap<>();
for (IAtom atom : atoms) {
output.addAtom(atom);
source.getConnectedLonePairsList(atom).forEach(output::addLonePair);

source.getConnectedSingleElectronsList(atom).forEach(output::addSingleElectron);
// resync: get the AtomRef in the context of the new container. This
// presumes atoms gets added at last position which is currently
// always the case
remap.put(atom, output.getAtom(atom.getAtomicNumber() - 1));
}
for (IBond bond : source.bonds()) {
IAtom beg = remap.get(bond.getBegin());
IAtom end = remap.get(bond.getEnd());
if (beg != null && end != null) {
output.addBond(bond);

// or the more efficient but you get a "NEW" bond so may
need to some fudging with
// setting aromatcity/ring membership etc, however if
you're selecting a substructure
// you MUST recalculate these anyways.
// output.addBond(beg.getIndex(), end.getIndex(),
bond.getOrder(), bond.getStereo());
}
}
return output;
}


On Mon, 5 Sept 2022 at 11:10, Uli Fechner  wrote:

>
> Hi,
>
> I get a NoSuchAtomException when executing the following code (that I
> inherited):
>
> public static IAtomContainer extractSubstructure(IAtomContainer source, 
> List atoms) {
> IAtomContainer output = 
> SilentChemObjectBuilder.getInstance().newAtomContainer();
> int k = 0;
> for (IAtom atom : atoms) {
> output.addAtom(atom);
> source.getConnectedLonePairsList(atom).forEach(lp -> 
> output.addLonePair(lp));
> source.getConnectedSingleElectronsList(atom).forEach(se -> 
> output.addSingleElectron(se));
> k++;
> for (int i = k; i < atoms.size(); i++) {
> IBond bond = source.getBond(atom, atoms.get(i));
> if (bond != null) {
> output.addBond(bond);
> }
> }
> }
> return output;
> }
>
> The stack trace starts at the line with
> output.addBond(bond);
>
> and continues with the following lines:
>
> org.openscience.cdk.exception.NoSuchAtomException: Atom is not a member of 
> this AtomContainer
>   at 
> app//org.openscience.cdk.silent.AtomContainer2.getAtomRef(AtomContainer2.java:185)
>   at 
> app//org.openscience.cdk.silent.AtomContainer2.newBondRef(AtomContainer2.java:221)
>   at 
> app//org.openscience.cdk.silent.AtomContainer2.addBond(AtomContainer2.java:908)
>
> The exception *isn't* thrown if I replace the instantiation of the
> IAtomContainer output with
> IAtomContainer output = new AtomContainer();
> I understand that the exception has something to do with the way
> AtomContainer2 is implemented, but I don't know how to switch to the
> AtomContainer2 implementation and still retain the intent of the code -
> which is to only add atoms, their connected LPs, their connected single
> electrons and their connected bonds, but not connected atoms - to the
> returned subgraph. Best
> Uli
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] using SmiFlavor.AtomAtomMapRenumber for canonical SMILES with mapping

2022-09-03 Thread John Mayfield
The flags are meant to be bit-wise combined BTW (see doc here:
https://cdk.github.io/cdk/latest/docs/api/index.html?org/openscience/cdk/smiles/SmilesGenerator.html
):

I think you probably just want:

SmilesGenerator generator1 = new
SmilesGenerator(SmiFlavor.Absolute|SmiFlavor.AtomAtomMap);

or

SmilesGenerator generator1 = new
SmilesGenerator(SmiFlavor.Canonical|SmiFlavor.AtomAtomMap);
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Merge all IAtomContainers in an AtomContainerSet into a single IAtomContainer

2022-09-01 Thread John Mayfield
Any CDK "I..." class should be created via the builder,
builder.newInstance(IAtom.class, "C") can be used and pass in arguments to
the constructor. However the newAtomContainer(), newAtom(), newBond() are
there for speed as the rest of it is done via reflection which is best
avoided. Also make sure you use cdk-silent (SilentChemObjectBuilder) and
NOT cdk-data (DefaultChemObjectBuilder) objects, since the later is if you
need notifications/listeners. CDK v2.0 may rename these
"(Standard)ChemObjectBuilder" (silent) and "NotifyChemObjectBuilder" (data)
to make it more clear which one you probably need to use.


> one component group would be assigned to *[Na+].[Cl-]* and a 2nd
> component group to *c1c1*? So the component grouping adds a
> semantical layer on top of the individual components?


Yep that's correct.


On Thu, 1 Sept 2022 at 13:26, Uli Fechner  wrote:

> Hi John,
>
> as always - thank you for your helpful reply, much appreciated.
>
> I'd be all for deleting/changing AtomContainerSet - I think both of us
> expressed our dissatisfaction with that class in a prior convo.
>
> Thanks for pointing out the issue regarding the instantiation of
> AtomContainer - there certainly is a bit of "new AtomContainer" in my code
> that I am going to replace. As the builder also has method calls for
> handing out atoms and bonds I assume that I should replace any
> instantiations that use the constructor for those objects, too?
>
> Regarding your code that uses the Java Spliterator Interface:
>
>
> *IAtomContainer combined =
> molset.getBuilder().newAtomContainer();molset.atomContainers().spliterator().forEachRemaining(combined::add);*
>
> I love the Stream API, but haven't used the Spliterator interface. My
> understanding is that the spliterator (also) is about being able to
> parallelize tasks which would then make the modification of collections
> that sit outside the spliterator chain of calls - as is the case for the
> IAtomContainer *combined* - potentially unsafe in a multithreading
> environment. However, I am not sure if I understood this correctly.
>
> Indeed, I deal with reactions :) I understand what a ReactionRole is. But
> I am not 100% sure if I understand component grouping. In your example of
> reactants
>
> *[Na+].[Cl-].c1c1>>*
>
> one component group would be assigned to *[Na+].[Cl-]* and a 2nd
> component group to *c1c1*? So the component grouping adds a
> semantical layer on top of the individual components?
>
> Best wishes
> Uli
>
> On Thu, Sep 1, 2022 at 5:04 PM John Mayfield 
> wrote:
>
>> Hi Uli,
>>
>> The code you have is the correct (almost) way to do it...  I'm not sure
>> a utility method is needed since it's simple/easy enough and clear what is
>> happening.
>>
>> The more correct code should not create AtomContainer with new since the
>> main class ATM is actually AtomContainer2. So things will possibly break if
>> you do it with "new AtomContainer" - we can't hide the constructor as it
>> will break down stream code. If we're going to break APIs I would rather
>> delete AtomContainerSet :-) (it is not even a set!).
>>
>>
>>
>>
>>
>> *IAtomContainer mergedAtomContainer =
>> atomContainerSet.getBuilder().newAtomContainer();for (IAtomContainer
>> atomContainer: atomContainerSet.atomContainers()) {
>> mergedAtomContainer.add(atomContainer);}*
>>
>> Since you can do it in two (one if you don't count the new container
>> construction)
>>
>>
>>
>> *IAtomContainer combined =
>> molset.getBuilder().newAtomContainer();molset.atomContainers().spliterator().forEachRemaining(combined::add);*
>>
>> I presume you're dealing with reactions in which case please look at the
>> *ReactionManipulator.toMolecule()* function. This collapses the reaction
>> into a single flat molecule, but sets the ReactionRole and ReactionGroup
>> properties so you can reverse it or pick things/subset easier. In your
>> function hear if you come from a reaction you will remove any component
>> grouping - e.g. *[Na+].[Cl-].c1c1>>* would be 2 AtomContainers in an
>> AtomContainerSet, if you convert this with your method then split again you
>> get 3 AtomContainers.
>>
>> *More info on AtomContainer2*
>>
>> AtomContainer2 will become AtomContainer shortly (example of errors you
>> get - https://github.com/cdk/cdk/issues/607, explanation -
>> https://github.com/cdk/cdk/wiki/AtomContainer2). Basically it's a
>> backwards compatible way of making the containers performant, but we needed
>> a staggered introduction.
>>
>> Best,
&

Re: [Cdk-user] Merge all IAtomContainers in an AtomContainerSet into a single IAtomContainer

2022-09-01 Thread John Mayfield
Hi Uli,

The code you have is the correct (almost) way to do it...  I'm not sure
a utility method is needed since it's simple/easy enough and clear what is
happening.

The more correct code should not create AtomContainer with new since the
main class ATM is actually AtomContainer2. So things will possibly break if
you do it with "new AtomContainer" - we can't hide the constructor as it
will break down stream code. If we're going to break APIs I would rather
delete AtomContainerSet :-) (it is not even a set!).





*IAtomContainer mergedAtomContainer =
atomContainerSet.getBuilder().newAtomContainer();for (IAtomContainer
atomContainer: atomContainerSet.atomContainers()) {
mergedAtomContainer.add(atomContainer);}*

Since you can do it in two (one if you don't count the new container
construction)



*IAtomContainer combined =
molset.getBuilder().newAtomContainer();molset.atomContainers().spliterator().forEachRemaining(combined::add);*

I presume you're dealing with reactions in which case please look at the
*ReactionManipulator.toMolecule()* function. This collapses the reaction
into a single flat molecule, but sets the ReactionRole and ReactionGroup
properties so you can reverse it or pick things/subset easier. In your
function hear if you come from a reaction you will remove any component
grouping - e.g. *[Na+].[Cl-].c1c1>>* would be 2 AtomContainers in an
AtomContainerSet, if you convert this with your method then split again you
get 3 AtomContainers.

*More info on AtomContainer2*

AtomContainer2 will become AtomContainer shortly (example of errors you get
- https://github.com/cdk/cdk/issues/607, explanation -
https://github.com/cdk/cdk/wiki/AtomContainer2). Basically it's a backwards
compatible way of making the containers performant, but we needed a
staggered introduction.

Best,
John

On Thu, 1 Sept 2022 at 06:08, Uli Fechner  wrote:

> Hi,
>
> I require the functionality to merge all IAtomContainers in an
> AtomContainerSet into a single IAtomContainer then having several
> disconnected components / graphs and couldn't find a suitable method in
> AtomContainerManipulator and AtomContainerSetManipulator.
>
> IAtomContainer mergedAtomContainer = new AtomContainer();
> for (IAtomContainer atomContainer: atomContainerSet.atomContainers()) {
> mergedAtomContainer.add(atomContainer);
> }
>
> If this isn't in CDK (and it is not just me not being able to find it) I
> am happy to make a PR by adding a method, probably to
> AtomContainerSetManipulator.
>
> Best
> Uli
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] 'removing' radicals from AtomContainers

2022-08-22 Thread John Mayfield
Something to note from SMILES there are no radicals, only undervalent
atoms. Which means your formula works correctly but from other formats
(e.g. MDL) you get an explicit unpaired electron added to the container.
Simple rules will get you pretty far and there are utilities like the CDK
AtomTypeMatcher which provide a global model but I would write what you
need since different valence models exist for different formats and based
on surrounds e.g. oxide's things change.

I will caution against "robo chemistry
" where you try to
guess what the correct answer is, for example to be a negative charge looks
more reasonable:

[Na+].O=C1[N-]C=CC=2C=CC(Br)=CC12.FC(F)(F)CI

Either way - I would probably avoid the getValency() field and instead
switch on the atomic number and guard against unusual charges:

int explValence = atomContainer.getBondOrderSum(atom);
switch (atomicNum) {
   case 6:
  if (charge == 0)
max(4 - explValence, 0);
 break;
   case 7:
  if (charge == 0 && explValence > 3)
max(5 - explValence, 0);
  else if (charge == 0)
max(3 - explValence, 0);
 break;
}

Further reading:

MDL valence model:
https://www.ics.uci.edu/~dock/manuals/oechem/pyprog/mdlvalence.html
Further Reading:
https://nextmovesoftware.com/blog/2013/02/27/explicit-and-implicit-hydrogens-taking-liberties-with-valence/
OPSIN has a good valence checker:
https://github.com/dan2097/opsin/blob/c827542501516a70fb91dce09bc1b275b80d/opsin-core/src/main/java/uk/ac/cam/ch/wwmm/opsin/ValencyChecker.java

John

On Mon, 22 Aug 2022 at 08:47, Uli Fechner  wrote:

> Hi,
>
> I came across an issue today that seemed straightforward at the beginning,
> but after a while ceased to appear that easily accessible. Well, I probably
> shouldn't be surprised - I guess that is just cheminformatics at its best :)
>
> The following smiles popped up in my workflow:
> [Na+].O=C1[N]C=CC=2C=CC(Br)=CC12.FC(F)(F)CI
>
> This translates to the sole nitrogen (valency = 3, SP2) being a radical
> with no implicit hydrogen and two neighboring carbon atoms both of which
> are connected by single bonds.
>
> Irrespective of how that radical got there I want to 'remove' it by just
> adding an implicit hydrogen to the nitrogen atom.
>
> This then led to the more general question of how to remove radicals for
> common organic elements (C, N, O, P, S seems like a good start).
>
> I came up with the following formula:
>
> int numberOfUnpairedElectrons = (int) (atom.getValency() -
> atomContainer.getBondOrderSum(atom) + atom.getFormalCharge() -
> atom.getImplicitHydrogenCount());
> if (numberOfUnpairedElectrons % 2 != 0) {
>atom.setImplicitHydrogenCount(atom.getImplicitHydrogenCount() + 1);
> }
>
> As this is chemistry, I am sure there are a lot of exceptions - even if
> the elements of interest are very restricted.
>
> Is the formula above a reasonable simplification? Or am I oversimplifying
> this?
>
> Best
> Uli
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] reaction smarts (i.e., smirks) support

2022-08-16 Thread John Mayfield
>
> *GeometryTools* "@deprecated use {@link *GeometryUtil*} moved for
> dependency reorganisation"


Specifically Java awt I think was the issue, so GeometryUtil should provide
like for like and more and is in cdk-standard module

*MoleculeFactory* "@deprecated Old CDK class primarily for testing, for CDK
> Tests please use *TestMoleculeFactory* in cdk-data."


We were reoragnising and decoupling tests

SMARTS is a bit more complicated because the entire API stack was
redesigned. Fortunately the doc should be good on these. The parser (and
generator) is now just "Smarts"

http://cdk.github.io/cdk/latest/docs/api/index.html?org/openscience/cdk/smarts/Smarts.html

Then for the Atom/Bond matchers, this is now handled by an expression tree
rather than let Java do dynamic dispatches on lots of different classes.
The key class is "Expr"(ression) and you can set/get expression on
QueryAtom|Bond.setExpression(expr).

http://cdk.github.io/cdk/latest/docs/api/index.html?org/openscience/cdk/isomorphism/matchers/Expr.html

For matching

http://cdk.github.io/cdk/latest/docs/api/index.html?org/openscience/cdk/smarts/SmartsPattern.html
(when from SMARTS string)
or
http://cdk.github.io/cdk/latest/docs/api/index.html?org/openscience/cdk/isomorphism/DfPattern.html
(when building an IAtomContainer up for matching)

The key challenge here is for some of these APIs you may need to make sure
all molecule creation is done via the factories and not "new
AtomContainer()", that will be less important in future but might be an
issue here.

Further reading:
- https://github.com/cdk/cdk/wiki/AtomContainer2
- SMARTS API release notes:
https://github.com/cdk/cdk/wiki/2.2-Release-Notes#core

On Tue, 16 Aug 2022 at 07:38, Uli Fechner  wrote:

> I did a backward dependency analysis of the cdk-legacy module for the
> ambit2 library.
>
> These are the classes in cdk-legacy that are used by ambit2:
>
> geometry/GeometryTools.class
> isomorphism/matchers/smarts/AliphaticAtom.class
> isomorphism/matchers/smarts/AliphaticSymbolAtom.class
> isomorphism/matchers/smarts/AnyAtom.class
> isomorphism/matchers/smarts/AnyOrderQueryBond.class
> isomorphism/matchers/smarts/AromaticAtom.class
> isomorphism/matchers/smarts/AromaticQueryBond.class
> isomorphism/matchers/smarts/LogicalOperatorAtom.class
> isomorphism/matchers/smarts/OrderQueryBond.class
> isomorphism/matchers/smarts/SMARTSAtom.class
> isomorphism/matchers/smarts/SMARTSBond.class
> smiles/smarts/parser/SMARTSParser.class
> smiles/smarts/SMARTSQueryTool.class
> templates/MoleculeFactory.class
>
> All of the classes listed above are used by both classes in main and in
> source of the ambit2 library with the exception of AliphaticSymbolAtom,
> LogicalOperatorAtom and SMARTSParser that are used by classes in test only.
>
> Best
> Uli
>
> On Tue, Aug 16, 2022 at 4:18 PM John Mayfield 
> wrote:
>
>> Do you consider it a problem that the library (heavily) depends on the
>>> cdk-legacy module? In other words, will the cdk-legacy module potentially
>>> be removed in an upcoming CDK 3 release?
>>
>>
>> Fine for now but it needs someone to go through and put in the new API
>> calls. Most of the stuff in legacy (with the exception of SMSD) there
>> should be a better way to do it now, SMSD there is a new
>> independent version (we tried to merge but ultimately it's a lot of code
>> and works well as a standalone app). If you give me the errors that occur
>> (class not found) when you remove cdk-legacy I can probably give the list
>> of what to use instead. Since you already have a local Ambit version
>> shouldn't be a problem to try them out.
>>
>> John
>>
>> On Tue, 16 Aug 2022 at 03:36, Uli Fechner  wrote:
>>
>>> Thank you John for the helpful information.
>>>
>>> The  declarations seem like an elegant solution.
>>> In my understanding, this only works if the project itself and its
>>> dependent projects are maven-based, i.e. there is no mix-and-match between
>>> maven-based and gradle-based projects here.
>>>
>>> My solution then was to compile the library of the maven-based project
>>> (ambit2), deploy the resultant jars to our internal maven repository, and
>>> pull those jars from the internal repository for the gradle-based project.
>>> I'd be happy to know if there is an easier way to do this...?
>>>
>>> Do you consider it a problem that the library (heavily) depends on the
>>> cdk-legacy module? In other words, will the cdk-legacy module potentially
>>> be removed in an upcoming CDK 3 release?
>>>
>>> On Mon, Aug 15, 2022 at 9:07 PM John Mayfield <
>>> john.w

Re: [Cdk-user] reaction smarts (i.e., smirks) support

2022-08-16 Thread John Mayfield
>
> Do you consider it a problem that the library (heavily) depends on the
> cdk-legacy module? In other words, will the cdk-legacy module potentially
> be removed in an upcoming CDK 3 release?


Fine for now but it needs someone to go through and put in the new API
calls. Most of the stuff in legacy (with the exception of SMSD) there
should be a better way to do it now, SMSD there is a new
independent version (we tried to merge but ultimately it's a lot of code
and works well as a standalone app). If you give me the errors that occur
(class not found) when you remove cdk-legacy I can probably give the list
of what to use instead. Since you already have a local Ambit version
shouldn't be a problem to try them out.

John

On Tue, 16 Aug 2022 at 03:36, Uli Fechner  wrote:

> Thank you John for the helpful information.
>
> The  declarations seem like an elegant solution. In
> my understanding, this only works if the project itself and its dependent
> projects are maven-based, i.e. there is no mix-and-match between
> maven-based and gradle-based projects here.
>
> My solution then was to compile the library of the maven-based project
> (ambit2), deploy the resultant jars to our internal maven repository, and
> pull those jars from the internal repository for the gradle-based project.
> I'd be happy to know if there is an easier way to do this...?
>
> Do you consider it a problem that the library (heavily) depends on the
> cdk-legacy module? In other words, will the cdk-legacy module potentially
> be removed in an upcoming CDK 3 release?
>
> On Mon, Aug 15, 2022 at 9:07 PM John Mayfield 
> wrote:
>
>> Do you import via gradle/maven repos? Or are you using local paths? I
>> didn't actually know the public release was still on CDK 1.4.11 but anyways.
>>
>> It sounds like you've maybe got it working but basically in Maven you
>> would do something like this with the publically released projects:
>>
>> 
>>> ...
>>> 
>>> ambit
>>> ambit2-core
>>> 2.4.9
>>> 
>>> 
>>> 
>>> org.opensicence.cdk
>>> cdk-core
>>> 2.7.1
>>> 
>>>
>> 
>>> org.opensicence.cdk
>>> cdk-isomorphism
>>> 2.7.1
>>> 
>>
>> ... etc etc
>>
>> 
>>> 
>>
>>
>> The dependencyManagement section basically says if any dependencies
>> (ambit in this case) that use CDK, rather than using the version they
>> declare you should v2.7.1. You would need to do this for all the modules it
>> uses. I tried to find the equivalent in gradle but no luck.
>> However you should be able to do something like this where you just
>> declare everything and the build tool will use the version you declare:
>>
>> dependencies {
>>implementation 'ambit:ambit2-core:2.4.9'
>>implementation 'org.openscience.cdk:cdk-core:2.7.1'
>>implementation 'org.openscience.cdk:cdk-isomorphism:2.7.1'
>>... etc etc
>> }
>>
>> On Mon, 15 Aug 2022 at 06:58, Uli Fechner  wrote:
>>
>>> It has been a while since I have used maven, so I am not quite sure if I
>>> understand correctly what you are referring to with the usage of
>>> . The project I want to use the ambit2 library as a
>>> dependency with uses gradle; at the moment, it directly depends on the jars
>>> that I compiled from the modified ambit library version.
>>>
>>> Here is a text export of the backward dependency analysis of the
>>> cdk-legacy jar:
>>>
>>> 
>>>   >> path="$MAVEN_REPOSITORY$/org/openscience/cdk/cdk-legacy/2.7.1/cdk-legacy-2.7.1.jar!/org/openscience/cdk/geometry/GeometryTools.class">
>>> >> path="$PROJECT_DIR$/ambit2-all/ambit2-rendering/src/main/java/ambit2/rendering/CompoundImageTools.java"
>>> />
>>> >> path="$PROJECT_DIR$/ambit2-all/ambit2-jchempaint/src/main/java/ambit2/jchempaint/editor/MoleculeEditAction.java"
>>> />
>>>   
>>>   >> path="$MAVEN_REPOSITORY$/org/openscience/cdk/cdk-legacy/2.7.1/cdk-legacy-2.7.1.jar!/org/openscience/cdk/isomorphism/matchers/smarts/AliphaticAtom.class">
>>> >> path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/smirks/SmartsMatch.java"
>>> />
>>> >> path="$PROJECT_DIR$/ambit2-all/ambit2-sln/src/main/java/ambit2/sln/io/SLN2ChemObject.java"
>>> />
>>> >> path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/SmartsHelper.java"
>>> />
>>

Re: [Cdk-user] reaction smarts (i.e., smirks) support

2022-08-15 Thread John Mayfield
/smarts/SmartsBondExpression.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/SmartsHelper.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/SingleNonAromaticBond.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/SingleBondAromaticityNotSpecified.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/SmartsManager.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/smirks/SmartsMatch.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/DoubleBondAromaticityNotSpecified.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/SmartsParser.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/DoubleNonAromaticBond.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-sln/src/main/java/ambit2/sln/SLNBond.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/IsomorphismTester.java"
> />
>   
>path="$MAVEN_REPOSITORY$/org/openscience/cdk/cdk-legacy/2.7.1/cdk-legacy-2.7.1.jar!/org/openscience/cdk/smiles/smarts/parser/SMARTSParser.class">
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/test/java/ambit2/smarts/test/AutomaticTestUtilities.java"
> />
>   
>path="$MAVEN_REPOSITORY$/org/openscience/cdk/cdk-legacy/2.7.1/cdk-legacy-2.7.1.jar!/org/openscience/cdk/smiles/smarts/SMARTSQueryTool.class">
>  path="$PROJECT_DIR$/ambit2-all/ambit2-descriptors/src/main/java/ambit2/descriptors/fingerprints/PubChemFingerprinterAmbitSmarts.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/query/SmartsPatternCDK.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smarts/src/test/java/ambit2/smarts/processors/StructureKeysBitSetGeneratorTest.java"
> />
>   
>path="$MAVEN_REPOSITORY$/org/openscience/cdk/cdk-legacy/2.7.1/cdk-legacy-2.7.1.jar!/org/openscience/cdk/templates/MoleculeFactory.class">
>  path="$PROJECT_DIR$/ambit2-all/ambit2-db/src/test/java/ambit2/db/processors/test/DbDescriptorWriterTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-core/src/test/java/ambit2/core/processors/test/InchiProcessorTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-db/src/test/java/ambit2/db/search/test/QuerySimilarityBitSetTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-ui/src/test/java/ambit2/ui/test/QueryBrowserTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-rendering/src/test/java/ambit2/rendering/RendererTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-apps/ambit2-www/src/test/java/ambit2/rest/test/similarity/SimilarityResourceTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-namestructure/src/test/java/ambit2/namestructure/test/Name2StructureProcessorTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-smi23d/src/test/java/ambit2/smi23d/test/CommandShellTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-db/src/test/java/ambit2/db/search/test/QuerySimilarityStructureTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-core/src/test/java/ambit2/core/processors/test/HydrogenAdderProcessorTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-model/src/test/java/ambit2/similarity/measure/test/AtomEnvironmentDistanceTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-core/src/test/java/ambit2/core/processors/test/AtomConfiguratorProcessorTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-db/src/test/java/ambit2/db/search/test/QueryCombinedTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-core/src/test/java/ambit2/core/external/test/CommandShellTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-apps/ambit2-www/src/test/java/ambit2/rest/test/query/SmartsResourceTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-tautomers/src/test/java/ambit2/tautomers/test/TautomersVisualisationTest.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-core/src/test/java/ambit2/core/obabel/test/TestOpenBabelShell.java"
> />
>  path="$PROJECT_DIR$/ambit2-all/ambit2-db/src/test/java/ambit2/db/processors/test/DbDescriptorValuesWriterTest.java"
> />
>  pa

Re: [Cdk-user] Generation of FCFP Fingerprint

2022-07-25 Thread John Mayfield
Hi Woon Yee,

The method is correct, you can emit them at hexadecimal and pad with 0.

John

On Mon, 25 Jul 2022 at 10:11, Egon Willighagen 
wrote:

>
> Dear Woon Yee,
>
> you can use the getFingerprint() method instead.
>
> Egon
>
> On Mon, 25 Jul 2022 at 10:59, #NG WOON YEE# via Cdk-user <
> cdk-user@lists.sourceforge.net> wrote:
>
>> Dear Helpdesk,
>>
>>
>>
>> I was using CDK (version 2.7) to generate FCFP4 and 6 for the compound
>> butyramide (
>> https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL1231396.sdf) and
>> ethanol (https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL545.sdf)
>> from their MolFiles which I got from CHEMBL. I was using the following
>> commands in CDK:
>>
>>
>> ---
>>
>> package ecfp;
>>
>> import java.io.*;
>>
>> import com.opencsv.CSVReader;
>>
>> import com.opencsv.CSVReaderBuilder;
>>
>> import com.opencsv.CSVWriter;
>>
>> import com.opencsv.exceptions.CsvException;
>>
>> import java.util.Arrays;
>>
>> import java.util.List;
>>
>> import java.io.FileInputStream;
>>
>> import java.io.IOException;
>>
>>
>>
>> import org.openscience.cdk.exception.CDKException;
>>
>> import org.openscience.cdk.fingerprint.CircularFingerprinter;
>>
>> import org.openscience.cdk.fingerprint.ExtendedFingerprinter;
>>
>> import org.openscience.cdk.fingerprint.ICountFingerprint;
>>
>> import org.openscience.cdk.interfaces.IAtomContainer;
>>
>> import org.openscience.cdk.interfaces.IChemObjectBuilder;
>>
>> import org.openscience.cdk.io.MDLV2000Reader;
>>
>> import org.openscience.cdk.silent.SilentChemObjectBuilder;
>>
>>
>>
>> public class main{
>>
>>   public static void main(String[] args) throws CDKException,
>> IOException {
>>
>>   String filename =
>> "C:\\Users\\NGWO0001\\Downloads\\CHEMBL545.sdf.txt";
>>
>>   FileInputStream in = new FileInputStream(filename);
>>
>>   MDLV2000Reader reader = new MDLV2000Reader(in);
>>
>>   IChemObjectBuilder bldr =
>> SilentChemObjectBuilder.getInstance();
>>
>>   IAtomContainer mol = reader.read(bldr.newAtomContainer());
>>
>>
>>
>>   CircularFingerprinter fingerprinter0 = new
>> CircularFingerprinter(
>>
>> CircularFingerprinter.CLASS_FCFP4
>>
>>   );
>>
>>
>>
>>
>>
>>   System.out.println("FCFP4 Ethanol:");
>>
>>   ICountFingerprint result0 =
>> fingerprinter0.getCountFingerprint(mol);
>>
>>   for (int k=0, n = result0.numOfPopulatedbins(); k < n; ++k)
>> {
>>
>>String ans4 = "";
>>
>>ans4 += result0.getHash(k);
>>
>>ans4 += " " + result0.getCount(k);
>>
>>System.out.printf("%s\n",ans4);
>>
>>   }
>>
>>
>>
>>   reader.close();
>>
>> }
>>
>> }
>>
>>
>> ---
>>
>>
>>
>> The results I got were:
>>
>>
>>
>> FCFP4 Butyramide:
>>
>> -1393198889 1
>>
>> -1212393386 1
>>
>> -1131767167 2
>>
>> 0 4
>>
>> 2 1
>>
>> 3 1
>>
>> 425233353 1
>>
>> 785469695 1
>>
>> 824716024 1
>>
>> 994111779 1
>>
>> 1429107614 1
>>
>>
>>
>> FCFP6 Butyramide:
>>
>> -1393198889 1
>>
>> -1212393386 1
>>
>> -1131767167 2
>>
>> 0 4
>>
>> 2 1
>>
>> 3 1
>>
>> 425233353 1
>>
>> 785469695 1
>>
>> 824716024 1
>>
>> 994111779 1
>>
>> 1429107614 1
>>
>>
>>
>> FCFP4 Ethanol:
>>
>> -1212393386 1
>>
>> 0 2
>>
>> 3 1
>>
>> 629394235 1
>>
>> 824716024 1
>>
>>
>>
>> FCFP6 Ethanol:
>>
>> -1212393386 1
>>
>> 0 2
>>
>> 3 1
>>
>> 629394235 1
>>
>> 824716024 1
>>
>>
>> I think these results may not be right since I thought that fingerprints
>> are supposed to be a series of hash and so they ought to be a series of
>> fixed-length integers. However, as you see in the results I got, for
>> example, for the FCFP6 for ethanol, one is 10-digits long while others are
>> single digits and 9-digits long.
>>
>>
>>
>> Can you please tell me what I am doing wrong?
>>
>>
>>
>> Thanking you in advance for your assistance and time.
>>
>>
>>
>> Best regards,
>>
>> Woon Yee
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>
>
> --
> 
> Super happy with this new eLife paper describing an Open Science project
> where we discuss 260 thousand natural products and where they came from,
> all 700 thousand pairs linked to their primary literature: "The LOTUS
> initiative for open knowledge management in natural products research",
> https://doi.org/10.7554/elife.70780
>
> -
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Twitter/Mastodon: @egonwillighagen 
>  / @egonw 
> 

Re: [Cdk-user] SMILES with @@ to 3D layout

2022-07-25 Thread John Mayfield
My view is this is currently beyond CDK's capabilities and that part needs
a rewrite/not fit for purpose. Like it sort of works but as you see it's a
bit half baked.

John

On Mon, 25 Jul 2022 at 08:23, dpoly  wrote:

> My goal is to create 3D depictions of small organic molecules, monomers
> and polymers, such as sugars, amino acids, polypeptides and the like.
> Stereo chemistry is important.
>
>
>
> I can use SMILES strings to create a molecular structure and 2D layout
> (StructureDiagramGenerator) just fine. 3D (ModelBuilder3D) not so much.
>
>
>
> ModelBuilder3D has a note saying that stereochemistry is a “standing
> problem”.
>
>
>
> Am I on the right track and just need to persevere, or should I look
> elsewhere? Any hints?
>
>
>
> Regards
>
> David M Bennett FACS
> *--*
>
> *Polygamo –** Programming Languages and Players for Games and Puzzles **--
> http://www.polyomino.com *
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] reaction smarts (i.e., smirks) support

2022-07-19 Thread John Mayfield
There "should" not be any breaking change 2.2.1 -> 2.7.1. If there is try
including the cdk-legacy module.

Best regards
John

On Tue, 19 Jul 2022 at 08:14, Uli Fechner  wrote:

> Hi,
>
> I want to apply a SMIRKS pattern to a molecule and get the products as a
> result. My understanding is that CDK does not support this at the moment.
> Is that correct?
>
> The ambit-smirks package by IdeaConsult seems to offer the functionality I
> am looking for. However, it declares a dependency on cdk 2.2.1 in its
> latest pom (
> https://github.com/ideaconsult/ambit-mirror/blob/master/ambit2-all/pom.xml).
> We use cdk 2.71 and I don't want to downgrade. Does anyone have any
> experience with bumping the cdk version in ambit and compiling it yourself?
>
> Any help is much appreciated.
>
> Best
> Uli
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Unable to create isotope patterns from formulas with non-default isotopes

2022-03-22 Thread John Mayfield
Yes sorry was just responding... I'm not even sure how you pass a String in
there... I get a compiler error. Essentially the IsotopeFactory is really a
dictionary of IUPAC isotopes. That method does do what is documented but a
tolerance of 1.0 is possibly too high. I still get null but would think it
should be something like this (below).

To be honest I've never used the isotope generator, are you sure you don't
just need the most-abundant molecular weight (MAMW) there is an efficient
function that.?

IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance();
> MolecularFormulaRange mfRange = new MolecularFormulaRange();
>
> IIsotope o15 = builder.newAtom();
> o15.setAtomicNumber(IAtom.O);
> o15.setExactMass(15.99491462);
> mfRange.addIsotope(o15, isotope.min, isotope.max);
> IIsotope o16 = builder.newAtom();
> o16.setAtomicNumber(IAtom.O);
> o16.setExactMass(16.9991317);
> mfRange.addIsotope(o16, isotope.min, isotope.max);


> MolecularFormulaGenerator generator = new
> MolecularFormulaGenerator(builder, 626.75, 626.9, mfRange);
> IMolecularFormulaSet formulaSet = new MolecularFormulaSet();


IMolecularFormula formula = generator.getNextFormula();


IsotopePatternGenerator isotope_pattern_generator = new
> IsotopePatternGenerator(0.01);
> IsotopePattern isotope_pattern =
> isotope_pattern_generator.getIsotopes(formula);
> for (IsotopeContainer isotope_container :
> isotope_pattern.getIsotopes()) {
>   System.out.println("Isotope: mass:" + isotope_container.getMass() +
> " intensity:" + isotope_container.getIntensity());
> }




On Tue, 22 Mar 2022 at 22:39, Rob Smith <2robsm...@gmail.com> wrote:

> Hi John,
> I took a look at the CDK source. Sure enough, the pattern generator just
> uses the element symbol to pull the generic isotopes from the atom (see
> lines 139-141 here
> https://github.com/cdk/cdk/blob/ce36b8af886e08e6015d15a2724001193f71be76/tool/formula/src/main/java/org/openscience/cdk/formula/IsotopePatternGenerator.java#L135
> ).
>
> That will always overwrite any custom isotopes you have.
>
> On Tue, Mar 22, 2022 at 3:59 PM Rob Smith <2robsm...@gmail.com> wrote:
>
>> I'm happy to. Just to clear up that I'm not doing something stupid,
>> here's a gist. Let me know and I can still put it up on github.
>>
>> I've chopped my code down to a minimal example below. The formula
>> generator search below will return hits that include 16C and 17C (though
>> this snippet just calls getNextFormula once, so it returns just one). The
>> masses of these hits will be in the correct range (626.75-626.9). But if
>> you print the isotope pattern generated, you will get masses outside of
>> that range, because they correspond to the same molecular formula (e.g.,
>> O38 or O39) with the normal C isotopes (12C, 13C, etc), and therefore
>> incorrect masses for each of the isotope pattern masses. In other words, if
>> you were to create a vanilla O38 or O39 molecule and generate the isotope
>> pattern (which, presumably uses the naturally-occuring ratio of 12C : 13C :
>> 14C,
>> 98.9 : 1.1 : 0.0001) you will get the same result as you do below when
>> you generate the isotope patterns for nonstandard isotopes (here, O38 is
>> 16C : 17C, 36:3). This suggests to me that the IsotopePatternGenerator does
>> not actually use the isotope objects in the IMolecularFormula provided, but
>> rather goes off the natural abundances of the atoms in the formula,
>> irrespective of the isotopes set in the IMolecularFormula.
>>
>> public class FormulaGenerator {
>>   public static void main(String[] args){
>> IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance();
>> MolecularFormulaRange mfRange = new MolecularFormulaRange();
>>
>> try{
>> IsotopeFactory isotopeFactory = Isotopes.getInstance();
>>
>> IIsotope i = isotopeFactory.getIsotope("O", "15.99491462", 1.0);
>> mfRange.addIsotope(i, isotope.min, isotope.max);
>>
>> i = isotopeFactory.getIsotope("O", "16.9991317", 1.0);
>> mfRange.addIsotope(i, isotope.min, isotope.max);
>> } catch (IOException e){
>> System.exit(0);
>> }
>>
>> MolecularFormulaGenerator generator = new
>> MolecularFormulaGenerator(builder, 626.75, 626.9, mfRange);
>> IMolecularFormulaSet formulaSet = new MolecularFormulaSet();
>>
>> IMolecularFormula formula = generator.getNextFormula();
>>
>> IsotopePatternGenerator isotope_pattern_generator = new
>> IsotopePatternGenerator(0.01);
>>  

Re: [Cdk-user] Unable to create isotope patterns from formulas with non-default isotopes

2022-03-22 Thread John Mayfield
Sounds like a possible bug, but not completely clear what you're trying to
do could you post an issue with example code on GitHub?

Thanks,
John

On Tue, 22 Mar 2022 at 21:06, Rob Smith <2robsm...@gmail.com> wrote:

> Hello,
> I've built a cdk workflow to generate IMolecularFormulas from a mass. The
> workflow allows ranges of counts of different isotopes for different atoms.
> So far, so good.
>
> The next step of the workflow is to generate IsotopePatterns for each
> generated IMolecularFormula.
>
> I'm probably missing something, but I have spent quite some time writing
> different approaches and verifications of what is happening.
>
> I have confirmed that the IMolecularFormulas outputted from the formula
> generator step are producing non-natural isotope abundances as expected to
> fit the range criteria. However, the IsotopePatternGenerator seems to
> ignore these isotopes and print the pattern as if only natural abundances
> are being used.
>
> Does the IsotopePatternGenerator not support non-standard isotope
> abundances? Or am I missing a step here?
>
> I'd appreciate any insight anyone may have, as I've exhausted all code
> permutations I can think of.
>
> Thank you for your time.
> -Rob
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Layout for Haworth and Fischer projections

2022-01-23 Thread John Mayfield
We do not currently have this feature and I was always sceptical about
adding it since there would be information loss for other tools since you
can not reliably store them. ChemDraw (and maybe CML) can store them okay
but not MOLfile (since it primarily stores stereochemistry with wedges).
The code in StereoElementFactory is about working out stereochemistry from
an input that is drawn but is not 100% since often it can be very ambiguous.

John

On Sun, 23 Jan 2022 at 10:27, dpoly  wrote:

> Hi All
>
>
>
> New to CDK, but most impressed by what I see.
>
>
>
> I can build molecules just fine and depict them in ball and stick 2D
> (using my own rendering). Very nice!
>
>
>
> But I can’t figure out how to do layout for Haworth and Fischer
> projections. I found StereoElementFactory but really no idea how to use
> it.
>
>
>
> Any pointers much appreciated.
>
>
>
> Regards
>
> David M Bennett FACS
> *--*
>
> *Polygamo –** Programming Languages and Players for Games and Puzzles **--
> http://www.polyomino.com *
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] About CDK and point groups

2022-01-21 Thread John Mayfield
Not that I know of. You could add it to the cdk-group module that Gillean
contributed, that module is very clean so providing it integrates well it
sounds good.

John

On Thu, 20 Jan 2022 at 13:18, Mehmet Aziz YİRİK 
wrote:

> Dear John,
>
> I have a question: do we already have a class for point group detection,
> in CDK ? Because I did not see expect some notes about point groups. If we
> dont have, will it be usefull to develop such a class ?
>
> Kind regards
> Aziz
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Question on CDK hydrogen bond acceptors

2021-11-26 Thread John Mayfield
2 days ago :-)

https://github.com/cdk/cdk/commit/dfbc32822e7d471bfb5a60aaf39a701371541280

On Fri, 26 Nov 2021 at 19:11, Andres Fernando Bernal Escobar <
andresf.bern...@utadeo.edu.co> wrote:

> Hi John,
> Thanks for the fix. When will it be merged into the main branch?
> Andrés
>
> El vie, 26 de nov. de 2021 a la(s) 03:21, John Mayfield (
> john.wilkinson...@gmail.com) escribió:
>
>> Hi Andres,
>>
>> Excellent analysis - thank you. Good to see that the recent Phenol change
>> should bring things more in agreement.
>>
>> John
>>
>> On Wed, 24 Nov 2021 at 21:29, Andres Fernando Bernal Escobar <
>> andresf.bern...@utadeo.edu.co> wrote:
>>
>>> Hello John, thanks for your answer. I ran a quick comparison between CDK
>>> and PubChem, with a few hand-picked molecules. These are the results:
>>> https://docs.google.com/spreadsheets/d/1yl3b05W319ZQW5K9TZf0iMYHbPoJyP5BV8QMLhf1kLE/edit?usp=sharing
>>>
>>> I split the molecules in four subsets. The first comprises seemingly
>>> non-problematic molecules: carboxylic acids, amines, aliphatic esters,
>>> aliphatic ethers. In these cases CDK, PubChem and my own intuition are all
>>> in agreement.
>>>
>>> The second subset comprises molecules where I think CDK is wrong and
>>> PubChem is correct: phenols. This is due to the issue that you corrected in
>>> the branch you linked.
>>>
>>> The third subset comprises molecules where I think CDK is correct and
>>> PubChem is wrong: aromatic ethers, amides, nitro compounds. In the case of
>>> aromatic ethers, we know CDK explicitly introduces a correction to exclude
>>> aromatic ether oxygens from the HB acceptors count. I am not a specialist,
>>> but I understand there are sound reasons to make this exception. PubChem
>>> doesn't seem to implement it. In the case of amides and nitro compounds I
>>> don't quite understand what is going on with PubChem, but CDK's answer
>>> seems the correct one to me.
>>>
>>> The last subset comprises aromatic esters (acyloxy substituents). I
>>> honestly don't know what is correct in this case. Are oxygen atoms from
>>> aromatic esters also an exception, just as those from aromatic ethers? That
>>> would mean CDK is right. Otherwise, another correction is needed to make
>>> sure CDK excludes no oxygens on aromatic rings other than those of ethers.
>>>
>>> El mar, 23 de nov. de 2021 a la(s) 04:27, John Mayfield (
>>> john.wilkinson...@gmail.com) escribió:
>>>
>>>> Thanks for your email. I've always thought the CDK HBond acceptor/donor
>>>> code is a little wonky and needs investigating. I don't have time to look
>>>> deeply at it but yes my reading of this is it doesn't check for the ether
>>>> oxygen correctly. If someone was inclined checking CDK's (and RDKit's)
>>>> values with PubChem would be a quick project that may provide some insight
>>>> onto missed cases and disagreements.
>>>>
>>>> I've made a change here to get the correct value for phenol:
>>>> https://github.com/cdk/cdk/compare/bug/hbondacceptor?expand=1
>>>>
>>>> On Fri, 15 Oct 2021 at 11:27, Guillermo Restrepo <
>>>> guillermo.restr...@mis.mpg.de> wrote:
>>>>
>>>>> We are working with some descriptors taken from Reaxys database, which
>>>>> according to its owner are computed using your CDK library. We found
>>>>> something unexpected and would very much appreciate it if you could
>>>>> help
>>>>> us to understand.
>>>>>
>>>>> We noted that some phenols are reported as having 0 hydrogen bond
>>>>> acceptors, whereas we expected them to have at least one. We checked
>>>>> CDK
>>>>> source code and found this comment on
>>>>> HBondAcceptorCountDescriptor.java:
>>>>>
>>>>> The following groups are counted as hydrogen bond acceptors:
>>>>> - any oxygen where the formal charge of the oxygen is non-positive
>>>>> (i.e.
>>>>> formal charge <= 0) except
>>>>>- an aromatic ether oxygen (i.e. an ether oxygen that is
>>>>> adjacent
>>>>> to at least one aromatic carbon)
>>>>> - an oxygen that is adjacent to a nitrogen
>>>>> - any nitrogen where the formal charge of the nitrogen is non-positive
>>>>> (i.e. formal charge <= 0) exc

Re: [Cdk-user] Question on CDK hydrogen bond acceptors

2021-11-26 Thread John Mayfield
Hi Andres,

Excellent analysis - thank you. Good to see that the recent Phenol change
should bring things more in agreement.

John

On Wed, 24 Nov 2021 at 21:29, Andres Fernando Bernal Escobar <
andresf.bern...@utadeo.edu.co> wrote:

> Hello John, thanks for your answer. I ran a quick comparison between CDK
> and PubChem, with a few hand-picked molecules. These are the results:
> https://docs.google.com/spreadsheets/d/1yl3b05W319ZQW5K9TZf0iMYHbPoJyP5BV8QMLhf1kLE/edit?usp=sharing
>
> I split the molecules in four subsets. The first comprises seemingly
> non-problematic molecules: carboxylic acids, amines, aliphatic esters,
> aliphatic ethers. In these cases CDK, PubChem and my own intuition are all
> in agreement.
>
> The second subset comprises molecules where I think CDK is wrong and
> PubChem is correct: phenols. This is due to the issue that you corrected in
> the branch you linked.
>
> The third subset comprises molecules where I think CDK is correct and
> PubChem is wrong: aromatic ethers, amides, nitro compounds. In the case of
> aromatic ethers, we know CDK explicitly introduces a correction to exclude
> aromatic ether oxygens from the HB acceptors count. I am not a specialist,
> but I understand there are sound reasons to make this exception. PubChem
> doesn't seem to implement it. In the case of amides and nitro compounds I
> don't quite understand what is going on with PubChem, but CDK's answer
> seems the correct one to me.
>
> The last subset comprises aromatic esters (acyloxy substituents). I
> honestly don't know what is correct in this case. Are oxygen atoms from
> aromatic esters also an exception, just as those from aromatic ethers? That
> would mean CDK is right. Otherwise, another correction is needed to make
> sure CDK excludes no oxygens on aromatic rings other than those of ethers.
>
> El mar, 23 de nov. de 2021 a la(s) 04:27, John Mayfield (
> john.wilkinson...@gmail.com) escribió:
>
>> Thanks for your email. I've always thought the CDK HBond acceptor/donor
>> code is a little wonky and needs investigating. I don't have time to look
>> deeply at it but yes my reading of this is it doesn't check for the ether
>> oxygen correctly. If someone was inclined checking CDK's (and RDKit's)
>> values with PubChem would be a quick project that may provide some insight
>> onto missed cases and disagreements.
>>
>> I've made a change here to get the correct value for phenol:
>> https://github.com/cdk/cdk/compare/bug/hbondacceptor?expand=1
>>
>> On Fri, 15 Oct 2021 at 11:27, Guillermo Restrepo <
>> guillermo.restr...@mis.mpg.de> wrote:
>>
>>> We are working with some descriptors taken from Reaxys database, which
>>> according to its owner are computed using your CDK library. We found
>>> something unexpected and would very much appreciate it if you could help
>>> us to understand.
>>>
>>> We noted that some phenols are reported as having 0 hydrogen bond
>>> acceptors, whereas we expected them to have at least one. We checked CDK
>>> source code and found this comment on HBondAcceptorCountDescriptor.java:
>>>
>>> The following groups are counted as hydrogen bond acceptors:
>>> - any oxygen where the formal charge of the oxygen is non-positive (i.e.
>>> formal charge <= 0) except
>>>- an aromatic ether oxygen (i.e. an ether oxygen that is adjacent
>>> to at least one aromatic carbon)
>>> - an oxygen that is adjacent to a nitrogen
>>> - any nitrogen where the formal charge of the nitrogen is non-positive
>>> (i.e. formal charge <= 0) except
>>> - a nitrogen that is adjacent to an oxygen
>>>
>>> The way we understood it, this means that phenols should have at least
>>> one hydrogen bond acceptor. But further down in the same file, these
>>> lines seem to specify otherwise:
>>>
>>> // looking for suitable oxygen atoms
>>>  else if (atom.getAtomicNumber() == IElement.O &&
>>> atom.getFormalCharge() <= 0) {
>>>  //excluding oxygens that are adjacent to a nitrogen or
>>> to an aromatic carbon
>>>  List neighbours = ac.getConnectedBondsList(atom);
>>>  for (IBond bond : neighbours) {
>>>  IAtom neighbor = bond.getOther(atom);
>>>  if (neighbor.getAtomicNumber() == IElement.N ||
>>>  (neighbor.getAtomicNumber() == IElement.C &&
>>>   neighbor.isAromatic() &&
>>>   bond.getOrder() != IBond

Re: [Cdk-user] Question on CDK hydrogen bond acceptors

2021-11-23 Thread John Mayfield
Thanks for your email. I've always thought the CDK HBond acceptor/donor
code is a little wonky and needs investigating. I don't have time to look
deeply at it but yes my reading of this is it doesn't check for the ether
oxygen correctly. If someone was inclined checking CDK's (and RDKit's)
values with PubChem would be a quick project that may provide some insight
onto missed cases and disagreements.

I've made a change here to get the correct value for phenol:
https://github.com/cdk/cdk/compare/bug/hbondacceptor?expand=1

On Fri, 15 Oct 2021 at 11:27, Guillermo Restrepo <
guillermo.restr...@mis.mpg.de> wrote:

> We are working with some descriptors taken from Reaxys database, which
> according to its owner are computed using your CDK library. We found
> something unexpected and would very much appreciate it if you could help
> us to understand.
>
> We noted that some phenols are reported as having 0 hydrogen bond
> acceptors, whereas we expected them to have at least one. We checked CDK
> source code and found this comment on HBondAcceptorCountDescriptor.java:
>
> The following groups are counted as hydrogen bond acceptors:
> - any oxygen where the formal charge of the oxygen is non-positive (i.e.
> formal charge <= 0) except
>- an aromatic ether oxygen (i.e. an ether oxygen that is adjacent
> to at least one aromatic carbon)
> - an oxygen that is adjacent to a nitrogen
> - any nitrogen where the formal charge of the nitrogen is non-positive
> (i.e. formal charge <= 0) except
> - a nitrogen that is adjacent to an oxygen
>
> The way we understood it, this means that phenols should have at least
> one hydrogen bond acceptor. But further down in the same file, these
> lines seem to specify otherwise:
>
> // looking for suitable oxygen atoms
>  else if (atom.getAtomicNumber() == IElement.O &&
> atom.getFormalCharge() <= 0) {
>  //excluding oxygens that are adjacent to a nitrogen or
> to an aromatic carbon
>  List neighbours = ac.getConnectedBondsList(atom);
>  for (IBond bond : neighbours) {
>  IAtom neighbor = bond.getOther(atom);
>  if (neighbor.getAtomicNumber() == IElement.N ||
>  (neighbor.getAtomicNumber() == IElement.C &&
>   neighbor.isAromatic() &&
>   bond.getOrder() != IBond.Order.DOUBLE))
>  continue atomloop;;
>  }
>  hBondAcceptors++;
>  }
>
> Is this intended, or is it a bug, or are we misunderstanding something?
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] how to split a molecule into target substructure at specified positions

2021-11-23 Thread John Mayfield
Hi, It's not quite clear exactly what you want to do. Do you want to pull
the matched substructure out of a molecule? This often isn't needed as you
either have the pattern as is or you just need the atom/bond indexes for
downstream processing.

Anyways if you really want to do that - you don't need fragment the
molecule just add the matched atoms/bonds to another atom container:

IAtomContainer mol = ...;
SmartsPattern pat = ...;
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
for (Map map : pat.matchAll(mol).toAtomBondMap())
{
IAtomContainer subgraph = bldr.newAtomContainer();
for (IChemObject cobj : map.values()) {
if (cobj instanceof IAtom)
subgraph.addAtom((IAtom) cobj);
else if (cobj instanceof IBond)
subgraph.addBond((IBond) cobj);
}
}

You can do it more like you described but it's less efficient since
removals are less efficient and you would need to work out how to handle
more than one pattern etc. Note you can't copy the molecule as would then
have different object references/

IAtomContainer mol = ...;
SmartsPattern pat = ...;
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
for (Map mapping :
pat.matchAll(mol).toAtomBondMap()) {
Set atomsToDelete = new HashSet<>();
for (IAtom atom : mol.atoms()) {
if (!mapping.containsKey(atom))
atomsToDelete.add(mol);
}
// note: bond's take care of themselves
for (IAtom atom : atomsToDelete)
mol.removeAtom(atom);
}

On Sat, 20 Nov 2021 at 12:46, biotech7  wrote:

> hi,everyone!
> one molecule has ring system(isolated rings or fused rings). firstly, by
> using *RingSearch() *to find rings. secondly, locate functional groups
> linked to the rings as the final target submolecule. to reach this goal, by
> utilizing * findSubstructure()* pattern (plus other algorithms) to search
> and locate all linked functional groups' positions. when this
> goal accomplished, break all bonds at these positions and acquire target
> substructure.
> *take a detailed example :*
> i want to split this molecule( *UniversalSmiles*)
> *CCCOC(=O)C1=C(C=C(C(=C1)S(=O)(=O)NC(=O)NC2=NC(=NC(=N2)OC)C)OCCCl)N(=O)=*O
> at atom positions: 30-31, 1-2,16-17 to get this target substructure
> *CCOC(=O)C1=C(C=C(C(=C1)S(=O)(=O)N)OC)N(=O)=O*
>
> currently, after getting all the posistions, i use
> *FragmentUtils.splitMolecule()*(in a protected class) method to split
> molecules. but this strategy only supports step by step splitting and
> requires reconstructing  structure as final tartget substructure.
>
> the question is : is there an algorithm(or a strategy) to split the
> molecule at all the positions(30-31, 1-2,16-17) only once to fully get the
> substructure(*CCOC(=O)C1=C(C=C(C(=C1)S(=O)(=O)N)OC)N(=O)=O*) without
> reconstruction?
> this issue has trapped me for many days.
>
> Regards!
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Support for chemical standardization

2021-10-05 Thread John Mayfield
Hi Staffan,

Not much in the base CDK but I think AMBIT has some utilities for it
http://ambit.sourceforge.net/

On Mon, 27 Sept 2021 at 20:24, Staffan Arvidsson McShane <
staffan.arvids...@gmail.com> wrote:

> I'm looking for a good way to perform chemical standardization for later
> descriptor and QSAR modeling, to hopefully get more robust results. I've
> searched the CDK javadoc and the cdkbook but haven't found any good matches
> and thus wonder if there's some support within CDK for this task? My
> expertise within the chemical field is limited so I would in the best of
> worlds use something with good defaults and not requiring much tuning or
> know-how and that still yields good results. If none exists within CDK, is
> there a good alternative, possibly within Java or JVM-based languages that
> anyone can recommend?
>
> Best,
> Staffan Arvidsson McShane
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Mol2 file to SMILES

2021-03-28 Thread John Mayfield
You should use the *CircularFingerprinter* for similarity.

On Sun, 28 Mar 2021 at 08:39, Sub Jae Shin  wrote:

> To John Mayfield
>
> Hi, I found the drugbank id property from AtomContainer's getproperties
> method, so that I could specify which atom container indicates which drug.
>
> I think my goal to get drug-drug similarity has been achieved in my guess.
>
> package com.company;
> import org.openscience.cdk.ChemFile;
> import org.openscience.cdk.exception.CDKException;
> import org.openscience.cdk.fingerprint.Fingerprinter;
> import org.openscience.cdk.fingerprint.IBitFingerprint;
> import org.openscience.cdk.fingerprint.IFingerprinter;
> import org.openscience.cdk.graph.rebond.Bspt;
> import org.openscience.cdk.interfaces.IAtomContainer;
> import org.openscience.cdk.interfaces.IChemFile;
> import org.openscience.cdk.io.MDLV2000Reader;
> import org.openscience.cdk.similarity.Tanimoto;
> import org.openscience.cdk.tools.manipulator.ChemFileManipulator;
>
> import java.io.*;
> import java.lang.reflect.Array;
> import java.util.ArrayList;
> import java.util.List;
> import java.util.Map;
>
> public class Main {
>
> public static void main(String[] args) {
> try {
>
> InputStream structures = new 
> FileInputStream("../data/drugbank/structures.sdf");
> MDLV2000Reader reader = new MDLV2000Reader(structures);
> IChemFile file = reader.read(new ChemFile());
> //Where can I find drugbank id?
>
> Fingerprinter finger = new Fingerprinter();
> List AtomData = 
> ChemFileManipulator.getAllAtomContainers(file);
> int count = AtomData.size();
> ArrayList df = new ArrayList<>();
>
> for(int i = 0; i < count; ++i) {
> ArrayList list = new ArrayList<>();
> IAtomContainer acReference = AtomData.get(i);
> Map refProperties = acReference.getProperties();
> list.add(refProperties.get("DATABASE_ID"));
> for(int j = 0; j < count; ++j) {
> IAtomContainer acStructure = AtomData.get(j);
> Map structProperties = acStructure.getProperties();
> System.out.println("REF DATABASE_ID : " + 
> refProperties.get("DATABASE_ID") +
> "-" + "COMP DATABASE_ID" + 
> structProperties.get("DATABASE_ID") + " similarity is now calculating");
> double similarity = cdkCalculateTanimotoCoef(finger, 
> acReference, acStructure);
> list.add(similarity);
> }
> df.add(list);
> }
> FileWriter result_csv = new 
> FileWriter("../data/drugbank/drug_drug_sim.csv");
>
> for(ArrayList a : df){
> String row = "";
> for(int i = 0; i < a.size(); ++i) {
> if(i == a.size() - 1) {
> row = row + a.get(i).toString() + "\n";
> }
> else {
> row = row + a.get(i).toString() + ",";
> }
> }
> // System.out.println(row);
> result_csv.write(row);
> }
>
> result_csv.close();
>
> //System.out.println(acReference.toString());
>
>
> } catch (FileNotFoundException | CDKException e) {
> System.out.println(e.getMessage());
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
>
> public static double cdkCalculateTanimotoCoef(IFingerprinter 
> fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) {
>
> double ret = 0.0;
>
> try {
>
> IBitFingerprint fpReference = 
> fingerprinter.getBitFingerprint(acReference);
>
> //Tanimoto-score
> IBitFingerprint fpStructure = 
> fingerprinter.getBitFingerprint(acStructure);
> ret = Tanimoto.calculate(fpReference, fpStructure);
>
> } catch (Exception ex) {
> //...
> }
>
> return ret;
> }
> }
>
>
> I hope this code result matches with my goal.
>
> I always thank you all, cdk developers.
>
> Sincerely
> Seopjae Shin
>
>
> On Fri, Mar 26, 2021 at 6:36 PM John Mayfield 
> wrote:
>
>> Do you have a mol2 file or a SMILES file? It's not clear. Mol2 support
>> isn't great in the CDK mainly because it's 

Re: [Cdk-user] Mol2 file to SMILES

2021-03-26 Thread John Mayfield
Do you have a mol2 file or a SMILES file? It's not clear. Mol2 support
isn't great in the CDK mainly because it's more a compchem/modelling format
than cheminformations which primarily use SMILES or MOLfile.

Presume you know how to read line by line from a file here is an example
from SMILES:

IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
> // load from SMILES and compute the ECFP (circular) fingerprint
> IFingerprinter fpr = new CircularFingerprinter();
> SmilesParser smipar = new SmilesParser(bldr);
> List smiles = Arrays.asList("Clc1c1",
> "Fc1c1",
> "Ic1c1",
> "Clc1n1");
> List fps = new ArrayList<>();
> for (String smi : smiles) {
> IAtomContainer mol = smipar.parseSmiles(smi);
> fps.add(fpr.getBitFingerprint(mol).asBitSet());
> }
> // print N^2 comparison table
> for (int j = 0; j < fps.size(); j++)
> System.out.print("," + smiles.get(j));
> System.out.print('\n');
> for (int i = 0; i < fps.size(); i++) {
> System.out.print(smiles.get(i));
> for (int j = 0; j < fps.size(); j++) {
> System.out.printf(",%.3f", Tanimoto.calculate(fps.get(i),
> fps.get(j)));
> }
> System.out.print('\n');
> }


,Clc1c1,Fc1c1,Ic1c1,Clc1n1
Clc1c1,1.000,0.368,0.368,0.292
Fc1c1,0.368,1.000,0.368,0.192
Ic1c1,0.368,0.368,1.000,0.192
Clc1n1,0.292,0.192,0.192,1.000

There are a lot more optimal ways of doing it and for a large comparison
table use ChemFP: https://chemfp.com/.

On Wed, 24 Mar 2021 at 06:42, Stesycki, Manuel 
wrote:

> Good morning,
>
> Use this class for Tanimoto calucations:
>  org.openscience.cdk.similarity.Tanimoto (see doc:
> http://cdk.github.io/cdk/latest/docs/api/index.html)
>
> you could do something like this to calculate your tanimoto score:
>
> public static double cdkCalculateTanimotoCoef(IFingerprinter
> fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) {
>
> double ret = 0.0;
>
> try {
>
> IBitFingerprint fpReference = fingerprinter.getBitFingerprint(
> acReference);
>
> //Tanimoto-score
> IBitFingerprint fpStructure = fingerprinter.getBitFingerprint(
> acStructure);
> ret = Tanimoto.calculate(fpReference, fpStructure);
>
> } catch (Exception ex) {
> //...
> }
>
> return ret;
> }
>
>
>
> Viele Grüße,
>Manuel Stesycki
>
> IT
>0208 / 306-2146
>Physikbau, Büro 117
>stesy...@mpi-muelheim.mpg.de
>
> Max-Planck-Institut für Kohlenforschung
>Kaiser-Wilhelm-Platz 1
>D-45470 Mülheim an der Ruhr
>http://www.kofo.mpg.de/de
>
> Am 24.03.2021 um 04:55 schrieb Sub Jae Shin :
>
> To CDK developers.
>
> Hello, I'm trying to get drug-drug similarity by Tanimoto score.
>
> I'm a beginner of cdk and java, so I'm stuck in the process of changing
> smiles file to Tanimoto score's calculate method's variable.
>
> package com.company;
> import org.openscience.cdk.ChemFile;
> import org.openscience.cdk.exception.CDKException;
> import org.openscience.cdk.interfaces.IChemFile;
> import org.openscience.cdk.io.SMILESReader;
> import java.io.*;
>
> public class Main {
>
> public static void main(String[] args) {
> try {
>
> InputStream mol2DataStream = new 
> FileInputStream("../data/drugbank/structure.smiles");
> SMILESReader reader = new SMILESReader(mol2DataStream);
> IChemFile file = reader.read(new ChemFile());
>
> } catch (FileNotFoundException | CDKException e) {
> System.out.println(e.getMessage());
> }
> }
> }
>
> Sincerely
> Seopjae Shin.
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] How to install Chemistry development kit after installing apache-maven-3.6.3

2020-12-30 Thread John Mayfield
Sorry I misread your email and that you had the JAR downloaded, you do not
need maven unless your project will use maven to build. You just need to
add the JAR to the classpath.

java -cp cdk-2.0.jar YourClassName
>

or in an IDE (e.g. Eclipse/IntelliJ) you would configure this from a menu
option.

On Wed, 30 Dec 2020 at 09:27, Winod Dhamnekar 
wrote:

> Hello,
>
> John May,
>
> Sir,
>
> What should be the contents of pom.xml file in cdk directory? What
> are the modelversion and snapshot? If you know it, please guide me in this
> regard.
>
> Cdk beginner user,
>
> WMD
>
>
>
> Sent from Mail  for
> Windows 10
>
>
>
> *From: *John May 
> *Sent: *30 December 2020 13:50
> *To: *Winod Dhamnekar 
> *Cc: *cdk-u...@lists.sf.net
> *Subject: *Re: [Cdk-user] How to install Chemistry development kit after
> installing apache-maven-3.6.3
>
>
>
> You need to run mvn install from the CDK directory, the install just
> builds the code and puts the JAR files in the maven repo directory
> (~/.m2/repository on Linux not sure where it is on Windows).
>
>
>
> If you just want to use the CDK you can actually just download the release
> jar from GitHub or let maven download them for you.
>
> Also note that CDK is a programming library and not an application.
>
>
>
> - John
>
>
>
> On 30 Dec 2020, at 06:25, Winod Dhamnekar 
> wrote:
>
> 
>
> Hello,
> I have java , java development kit 32 bit and 64 bit installed on my
> laptop. I have installed apache maven 3.6.3 and its path is C: \Program
> Files\apache-maven-3.6.3. I have downloaded chemistry development kit
> cdk-2.0.jar and it is Program Files directory.
>
> On my laptop, JAVA_HOME environment variable is set to C:\Program
> Files\Java\jdk-15.0.1. But at the time of installation of cdk-2.0.jar by
> giving command mvn install at the command prompt, following screen appears.
>
> <55C2E9251DE648ABBED995B9CC4CFC30.png>
>
>   How to overcome this difficulty?
>
>
>
> Cdk beginner user,
>
>
>
> WMD
>
>
>
> Sent from Mail  for
> Windows 10
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Wrong molecular formula?

2020-12-03 Thread John Mayfield
Hi Manual,

Chris is right, unfortunately the ChemDraw export isn't quite correct. It
is actually possible to represent multi-attach in V3000 but it's not used
here. The more common problem is that there are simply a random bond into
the middle of a ring. I've done a fair bit of work on ChemDraw processing (
https://nextmovesoftware.com/blog/2016/07/28/sketchy-sketches/), the
biggest issue is the ChemDraw chemical formula/abbreviation parsing, for
example K2CO3 has a peroxide, HATU is a "[H]*[3H][U]", etc (I show more
examples in the poster).

NextMove has a commercial tool to generate CXSMILES, for you example note
the *m:* part on the end that captures the positional variation.

[john@harbinger:Praline]% java -jar exec/target/praline.jar convert
> ~/Downloads/structure.cdx --cxsmi
> [Ru]([P](CCC1=CC=CC=C1)(C2C2)C3C3)(Cl)(Cl)*.C1(=CC=C(C=C1)C(C)C)C
> |m:24:25.26.27.28.29.30| structure Molecule/Specific/High/+PVar


CDK can read and handle this, we actually do get the formula wrong still
though (will fix that).

OpenBabel has a FOSS ChemDraw parser, one option could be to modify that
and parse your examples to get the info and then generate the
MOLfile/CXSMILES. The parsing is easy *NodeType="MultipleAttach"
Attachments="{id1} {id2} .."* where the id's are node ids. Unfortunately I
don't think they have the data structures to represent it so it would be a
fair bit of work other than handling these fields.

All the best,
John

On Wed, 2 Dec 2020 at 15:05, Christoph Steinbeck <
christoph.steinb...@uni-jena.de> wrote:

> Dear Manuel,
>
> if you open the mol file in a text editor, there are clearly 31 C atoms in
> the file.
> So the CDK is “right”. I also opened the file in Marvin Sketch and it
> output the analysis below.
>
> ChemDraw uses a fishy trick, as it seems, to create the illusion of a
> multi-center attachment. Clearly, they focus on publication-ready drawing
> of chemical structures and not one creating correct file representations of
> the chemistry. Fact is that the end of the line to the center of the
> benzene ring is a carbon atom and nothing else.
>
> Kind regards,
>
> Chris
>
> —
> Prof. Dr. Christoph Steinbeck
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> Phone Secretariat: +49-3641-948171
> http://cheminf.uni-jena.de
> http://orcid.org/-0001-6966-0814
>
> What is man but that lofty spirit - that sense of enterprise.
> ... Kirk, "I, Mudd," stardate 4513.3..
>
>
>
>
>
> > On 2. Dec 2020, at 14:38, Stesycki, Manuel 
> wrote:
> >
> > Dear CDK users,
> >
> > we are using CDK version 2.3 in our application.
> > As a user tried to add a structure (see attachment) we found a
> difference in the molecular formula of the structure.
> >
> > The original structure was draw with ChemDraw 18.
> > A multi-center attachment was added to the structure and ChemDraw shows
> this molecular formula: C30H46Cl2PRu
> >
> > Whereas our application takes the mol-version of the cdx-file and
> computes this formula: C31H49Cl2PRu
> > To get the formula we use this piece of code:
> >
> > IMolecularFormula form =
> MolecularFormulaManipulator.getMolecularFormula(mol);
> > sumFormula = MolecularFormulaManipulator.getString(form);
> >
> > Did we missed something by creating the AtomContainer?
> > We create the atomcontainer directly by parsing the mol-file:
> > try (StringReader sr = new StringReader(molFile); MDLV2000Reader mr =
> new MDLV2000Reader(sr, mode)) {
> >
> > AtomContainer mol = new AtomContainer();
> > AtomContainer ac = mr.read(mol);
> > }
> >
> > Maybe someone can give us a hint, what we are doing wrong.
> >
> > Best regards,
> >Manuel Stesycki
> >
> > IT
> >0208 / 306-2146
> >Physikbau, Büro 117
> >stesy...@mpi-muelheim.mpg.de
> >
> > Max-Planck-Institut für Kohlenforschung
> >Kaiser-Wilhelm-Platz 1
> >D-45470 Mülheim an der Ruhr
> >http://www.kofo.mpg.de/de
> >
> > ___
> > Cdk-user mailing list
> > Cdk-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] atom typing without atom type name

2020-09-21 Thread John Mayfield
The SMILES parser nor other IO (maybe CML) will assign atom types for you -
you need to do this yourself with:

AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(methane);

Atom types are an annotation on top of a molecule. There are different atom
types we could assign - CDK atom types are just on set, ALOGP is a
different set (for example). In pre CDK 1.4 basically everything was built
on top of the view that CDK atom types were present - this is no longer the
case.

On Sun, 20 Sep 2020 at 22:44, Rajarshi Guha  wrote:

> Hi, the following code is failing because the parsed molecule has no atom
> type names. The calculate() method tries to identify atom types from the
> atoms type name, but this seems circular. Unless I assign atom types, where
> does the type name come from?
>
> public class CDKVolumeTest {
> public static void main(String[] args) throws CDKException {
> SmilesParser sp = new 
> SmilesParser(DefaultChemObjectBuilder.getInstance());
> IAtomContainer mol = sp.parseSmiles("CCO");
>
> double vol = VABCVolume.calculate(mol);
> }
> }
>
>
> --
> Rajarshi Guha | http://blog.rguha.net | @rguha 
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] CDK in PyCharm IDE

2020-07-05 Thread John Mayfield
Hi Stuart,

Sorry for the late reply

Finally, while I was hoping it would easy to use CDK right in
> PyCharm/Python I now see that I will have to install it separately and use
> the command line.
> I guess I don’t know, is there command line functionality built into the
> CDK jar file?


You don't need to go via the command line, IDEA
<https://www.jetbrains.com/idea/> is the equivalent of PyCharm, in fact I
think it came first and I believe PyCharm reuses most of the UI :-). We
don't have any command line utilities and what you're trying to do is quite
specific so couldn't imagine having a utility for it if we did provide
those. It's best to think of CDK like NumPy - it's a set of tools for you
build something with. Since you sound like you're more familiar with Python
so why not use RDKit or Open Babel?

So, I have a project where I am converting a lot of data to RDF and as a
> result I am adding compounds to a graph DB with lots of descriptors and
> identifiers.
> Thus, the first thing I want to do with CDK is to get descriptors for
> compounds, preferably using the InChIKey (but other identifiers if needed).


I presume you mean Resource Description Framework (RDF) but note CTfile
Reaction Data File (RDF)
<http://help.accelrysonline.com/ulm/onelab/1.0/content/ulm_pdfs/direct/reference/ctfileformats2016.pdf>
is
common cheminformatics format so can get confusing. Egon knows more
about RDF but I believe Apache JENA <https://jena.apache.org/> will help
here - I think we actually use in the CDK already.

Thus, the first thing I want to do with CDK is to get descriptors for
> compounds, preferably using the InChIKey (but other identifiers if needed).


You can't read InChI-Key but perhaps I misunderstand.

Second, I want to store the atoms and bonds of the molecular graph in the
> data (as RDF).  I am therefore really interested in either:
> - getting access to the InChI canonicalization algorithm (if CDK has a
> version of that outside of the InChI code) OR

- obtaining a .mol file where the connection layer atoms are labelled with
> the atom numbers from the canonicalization routine
> My idea is to see if I can generate the InChI string from the molecular
> graph using semantic inferencing


We call into the standard InChI native code, however we do provide
convenient access to the InChI atoms numbers:
http://cdk.github.io/cdk/latest/docs/api/index.html?org/openscience/cdk/graph/invariant/InChINumbersTools.html.
It's not too hard to pull these out and set them on a MOLfile or SMILES.

John


On Thu, 25 Jun 2020 at 18:08, Chalk, Stuart  wrote:

> John
>
> Thanks for reply.
>
> So, I have a project where I am converting a lot of data to RDF and as a
> result I am adding compounds to a graph DB with lots of descriptors and
> identifiers.
> Thus, the first thing I want to do with CDK is to get descriptors for
> compounds, preferably using the InChIKey (but other identifiers if needed).
>
> Second, I want to store the atoms and bonds of the molecular graph in the
> data (as RDF).  I am therefore really interested in either:
> - getting access to the InChI canonicalization algorithm (if CDK has a
> version of that outside of the InChI code) OR
> - obtaining a .mol file where the connection layer atoms are labelled with
> the atom numbers from the canonicalization routine
> My idea is to see if I can generate the InChI string from the molecular
> graph using semantic inferencing
>
> Finally, while I was hoping it would easy to use CDK right in
> PyCharm/Python I now see that I will have to install it separately and use
> the command line.
> I guess I don’t know, is there command line functionality built into the
> CDK jar file?
>
> Any advice much appreciated…
> Stuart
>
> On Jun 24, 2020, at 7:26 PM, John Mayfield 
> wrote:
>
> Hi Stuart,
>
> We have some small snippets here (
> https://github.com/cdk/cdk/wiki/Toolkit-Rosetta
> <https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcdk%2Fcdk%2Fwiki%2FToolkit-Rosetta=02%7C01%7Cschalk%40unf.edu%7Cbb26e68251c14bbe4c7708d818961a29%7Cdf29b2fa8929482f9dbb60ff4df224c4%7C1%7C0%7C637286380256993241=uQmREXtAYy7PI9JEqDH%2Fest4GyQN5modZ4PsmUx8Roo%3D=0>)
> but most of our doc is geared towards having at least some familiarity with
> writing and using Java libraries. Saying you don't see a plugin in the IDE
> for CDK is like saying you don't see a petrol cap on an electric car.
> Removing the Python vs JAVA issues - CDK is a chemistry toolkit (distinctly
> not an application) - you link in the JAR file to your own code and use
> it's components. IDE plugins help you code, formatting, syntax highlighting
> etc.
>
> Now skipping over a lot of details you can link a Java JAR in different
> ways either manually via the classpath or more commonly via

Re: [Cdk-user] CDK in PyCharm IDE

2020-06-24 Thread John Mayfield
Hi Stuart,

We have some small snippets here (
https://github.com/cdk/cdk/wiki/Toolkit-Rosetta) but most of our doc is
geared towards having at least some familiarity with writing and using Java
libraries. Saying you don't see a plugin in the IDE for CDK is like saying
you don't see a petrol cap on an electric car. Removing the Python vs JAVA
issues - CDK is a chemistry toolkit (distinctly not an application) - you
link in the JAR file to your own code and use it's components. IDE plugins
help you code, formatting, syntax highlighting etc.

Now skipping over a lot of details you can link a Java JAR in different
ways either manually via the classpath or more commonly via a build tool
(e.g. maven/gradle/ant). Before going further I think it would be better to
start from what you hope to do, i.e. why CDK within the PyCharm IDE? Are
you just wanting to have a play around or was there a task you wanted
to accomplish?

John

On Wed, 24 Jun 2020 at 23:26, Chalk, Stuart  wrote:

> Markus
>
> Thanks for that great idea!
>
> Sadly, I don’t find CDK in the Plugin marketplace for PyCharm.
> All the plugins are coding related of course...
>
> Regards,
> Stuart
>
> On Jun 24, 2020, at 5:35 PM, Markus Sitzmann 
> wrote:
>
> Hi Stuart,
>
> Pycharm is a specialized version of the IntelliJ IDE for python, Intelij
> itself (and pycharm) is written in Java, so the solution for CDK should be
> using Intelij. I am not sure if you can get Java extensions for pycharm
> (the professional versions of pycharm and IntelliJ even require separate
> licenses, but there is a community version of both)
>
> Markus
>
> ---
> Markus Sitzmann
>
>
> On 24. Jun 2020, at 23:20, Chalk, Stuart  wrote:
>
>  I am interested in using CDK within the PyCharm IDE.  I see that the
> recommendation on the CDK website is to use Cinfony, however the package on
> PyPi does not work and there does not seem to be a version after 2012.
> If anyone has expertise/advise/suggestions please let me know…
>
> I hope everyone out there in CDK land is doing OK given the current
> situation...
>
> Stuart Chalk, Ph.D.
> Professor of Chemistry
> Department of Chemistry, Building 50, Room 3514,
> University of North Florida
> 1 UNF Drive, Jacksonville, FL 32224 USA
> ORCID: -0002-0703-7776
> P: 904-620-1938
> F: 904-620-3535
> E: sch...@unf.edu
> W: http://www.unf.edu/coas/chemistry/
> 
> faculty/Stuart_Chalk.aspx
> 
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] How can I save a wavy bond in a file?

2020-04-02 Thread John Mayfield
In that case it should be stored as the "up_or_down" and looking at NCDK it
looks correct to me:

MDL reader

https://github.com/kazuyaujihara/NCDK/blob/master/NCDK/IO/MDLV2000Reader.cs#L1359

MDL write

https://github.com/kazuyaujihara/NCDK/blob/master/NCDK/IO/MDLV2000Writer.cs#L577

Again this could be a JChemPaint issue using the old "MDLReader/Writer".

On Wed, 1 Apr 2020 at 14:17, Shao Frankro  wrote:

> Actually I am using C# with NCDK <https://github.com/kazuyaujihara/NCDK>,
> But I don't found Bond "Display" property of it, maybe it hasn't been
> updated yet.
> Anyway, must I write my own file format writer? I thought there would be a
> serialization method of CDK/NCDK.   : )
>
> --
>  John Mayfield 
>  2020-4-1 20:00
>  Re: [Cdk-user] How can I save a wavy bond in a file?
>
> Yes they are round tripped by MDL (BondStereo.UP_OR_DOWN) for sure,
> however could be a JChemPaint issue - and that's no longer actively
> developed.
>
> Corner-case but are you using a JChemPaint release or built one yourself?
> If you've mixed in a new version of CDK it may be tripped up with the new
> Bond "Display" property.
>
>
> https://github.com/cdk/cdk/blob/master/base/interfaces/src/main/java/org/openscience/cdk/interfaces/IBond.java#L137
>
> On Wed, 1 Apr 2020 at 04:31, Shao Frankro  wrote:
>
> Dear all,
> I am reading the source code of JChemPaint and writing a molecule
> editor based on CDK. I found the wavy bonds become solid bonds when I
> save them to a MDL/CML format file and reopen it in JChemPaint.
> So I want to know if there's any file format or any way that can save
> all the informations of CDK's memory model to a file and I cound reload
> it(maybe like serialization and deserialization), If not, what is the
> easiest way to achieve this?
>
> Thanks for your help!
>
>
> PS: I found the CDKSourceCodeWriter may save the informations of CDK's
> memory model, but I don't know how to use these codes and it also
> lose Stereo information.
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] How can I save a wavy bond in a file?

2020-04-01 Thread John Mayfield
Yes they are round tripped by MDL (BondStereo.UP_OR_DOWN) for sure, however
could be a JChemPaint issue - and that's no longer actively developed.

Corner-case but are you using a JChemPaint release or built one yourself?
If you've mixed in a new version of CDK it may be tripped up with the new
Bond "Display" property.

https://github.com/cdk/cdk/blob/master/base/interfaces/src/main/java/org/openscience/cdk/interfaces/IBond.java#L137

On Wed, 1 Apr 2020 at 04:31, Shao Frankro  wrote:

> Dear all,
> I am reading the source code of JChemPaint and writing a molecule
> editor based on CDK. I found the wavy bonds become solid bonds when I
> save them to a MDL/CML format file and reopen it in JChemPaint.
> So I want to know if there's any file format or any way that can save
> all the informations of CDK's memory model to a file and I cound reload
> it(maybe like serialization and deserialization), If not, what is the
> easiest way to achieve this?
>
> Thanks for your help!
>
>
> PS: I found the CDKSourceCodeWriter may save the informations of CDK's
> memory model, but I don't know how to use these codes and it also
> lose Stereo information.
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Raw fingerprints impossible to calculate

2020-02-25 Thread John Mayfield
Okay,

I'm going to presume you want to search the data.. to retrieve similar
compounds or substructures. If not then just store the hexadecimal
fingerprint.

It's not impossible to do searching in MongoDB, see a talk from Matt Swain
<https://matt-swain.com/blog/2014-06-03-chemical-similarity-search-in-mongodb>,
... and my follow ups:
http://efficientbits.blogspot.com/2014/11/memory-mapped-fingerprint-index-part-i.html
,
http://efficientbits.blogspot.com/2014/12/memory-mapped-fingerprint-index-part-ii.html
.

However my view is (as I make clear in those blog posts) MongoDB is the
wrong technology for this, but you could convert your the binary
fingerprint to a vector. In fact to *toString* works well:

System.out.println(new
> Fingerprinter().getBitFingerprint(mol).asBitSet().toString());


{43, 46, 51, 60, 65, 70, 72, 86, 95, 99, 111, 114, 123, 128, 144, 157, 158,
161, 166, 174, 185, 188, 204, 213, 222, 253, 271, 275, 278, 311, 315, 320,
335, 364, 371, 379, 390, 409, 446, 449, 463, 486, 498, 520, 523, 535, 540,
565, 574, 586, 588, 611, 628, 632, 637, 647, 649, 655, 667, 725, 742, 756,
770, 793, 845, 859, 865, 918, 951, 954, 959, 1015}

You could then use and/or queries to find fingerprint subsets or computer
Tanimotos etc.

John

On Mon, 24 Feb 2020 at 13:44, Maria Sorokina 
wrote:

> I see the problem.
>
> Well, originally, I wanted to checkout how the raw fingerprints look like.
> I am storing all the data (and the fingerprints) in MongoDB, and I am still
> not sure if in case I save the BitFingerprints directly in there (with is
> possible when the field has an Object type), if they will be parseable by
> the mongo engine as fingerprints (without retrieving them to be read with
> CDK). So this is why I wanted to check the raw fingerprints, as they should
> be more JSON-friendly format, and mongo engine would be able to read those
> integers and strings for further similarity search.
>
> Kind regards,
> Maria
>
>
> Dr. Maria Sorokina
> Steinbeck Research Group
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> http://cheminf.uni-jena.de
>
> Le 21 févr. 2020 à 19:31, John Mayfield  a
> écrit :
>
> Okay looking at it the Substructure fingerprint would be easy to adapt...
> but it's not hard to just count the substructures. Utility code like that
> is difficult to justify, every line is more to maintain.
>
> The other problem is I don't like the fingerprint APIs so it's a toss-up
> between using effort to implement something I (or hopefully someone else)
> will ultimately rewrite in future. "Deprecated on arrival" I believe Egon
> has said before.
>
> On Fri, 21 Feb 2020 at 18:25, John Mayfield 
> wrote:
>
>> What do you think the "raw" fingerprint is? Why would you expect it for
>> the Substructure one?
>>
>> On Fri, 21 Feb 2020 at 09:47, Maria Sorokina 
>> wrote:
>>
>>> I tried in total 7 fingerprinters (PubChem, Substructure, MACCS,
>>> KlekotaRoth, Circular, ShortestPath and Hybrifization) and none worked. For
>>> some, I’m not surprised, but I was really expecting to have the raw
>>> fingerprints for the Substructure one
>>>
>>>
>>> Dr. Maria Sorokina
>>> Steinbeck Research Group
>>> Analytical Chemistry - Cheminformatics and Chemometrics
>>> Friedrich-Schiller-University Jena, Germany
>>> http://cheminf.uni-jena.de
>>>
>>> Le 21 févr. 2020 à 10:39, John Mayfield  a
>>> écrit :
>>>
>>> ... I do have some patches for an updated fingerprint API stack that
>>> would also add this in to more places. Essentially it was added to the
>>> public API but only implemented in a few places and left as a "ToDo"
>>> elsewhere. Might be something for the hack-a-thon.
>>>
>>> I should PubChem fingerprints are binary in nature though so you would
>>> probably never want the RAW version. *getBitFingerprint()* it
>>> implemented always.
>>>
>>> John
>>>
>>> On Fri, 21 Feb 2020 at 09:34, John Mayfield 
>>> wrote:
>>>
>>>> Hi Maria,
>>>>
>>>> Not all fingerprint support the "RAW" option and Count options.
>>>>
>>>> John
>>>>
>>>> On Fri, 21 Feb 2020 at 09:31, Maria Sorokina 
>>>> wrote:
>>>>
>>>>> Dear community,
>>>>>
>>>>> It is decidedly substructure search and fingerprinting period of the
>>>>> year!
>>>>>
>>>>> I want to create (to store) raw fingerprints of a range of different
>>>>> fingerprint 

Re: [Cdk-user] Substructure search using ShortestPathFingerprinter

2020-02-25 Thread John Mayfield
Yes good idea, I added a comment at the bottom but it does explicitly say
that at the top.

On Tue, 25 Feb 2020 at 08:43, nicepeopleproject 
wrote:

> Thank you!
> The documentation for the ShortestPathFingerprinter class says "Fingerprints
> allow for a fast screening step to exclude candidates for a substructure
> search in a database. They are also a means for determining the similarity
> of chemical structures.". Perhaps it’s worth removing so that there are
> no contradictions.
>
> чт, 20 февр. 2020 г. в 18:28, John Mayfield :
>
>> I've added a warning in the doc, there was already a warning on MACCS 166
>> keys.
>>
>> https://github.com/cdk/cdk/commit/82cb4f8d49283e117696f40d09538c70790a18fd
>>
>> On Thu, 20 Feb 2020 at 15:20, John Mayfield 
>> wrote:
>>
>>> *wrote :-)
>>>
>>> On Thu, 20 Feb 2020 at 15:20, John Mayfield 
>>> wrote:
>>>
>>>> Only *Fingerprinter* or *ExtendedFingerprint* obey this transitivity
>>>> property.
>>>>
>>>> Relevant post I wrong in 2015:
>>>> https://nextmovesoftware.com/blog/2015/02/16/for-every-fingerprint-optimisation-there-is-an-equal-and-opposite-fingerprint-deterioration/
>>>>
>>>> On Thu, 20 Feb 2020 at 10:44, nicepeopleproject <
>>>> nicepeopleproj...@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>> I'm trying to realize substructure search. As I understand, the
>>>>> ShortestPathFingerprinter is suitable for this. I ran into the following
>>>>> problem. I attach two file(in molecules.zip). when using butane.mol as
>>>>> query, should find ciclopentane.mol. When i found BitSet for butane i got:
>>>>> {115, 503, 540, 653, 893}
>>>>> {115, 503, 542, 653, 893} - for ciclopentane.
>>>>> So i cannot find ciclopentane. Is there a way to make it work?
>>>>>
>>>>> --
>>>>> С уважением,
>>>>> Николаев Артём
>>>>> ___
>>>>> Cdk-user mailing list
>>>>> Cdk-user@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>>>
>>>>
>
> --
> С уважением,
> Николаев Артём
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Raw fingerprints impossible to calculate

2020-02-21 Thread John Mayfield
Okay looking at it the Substructure fingerprint would be easy to adapt...
but it's not hard to just count the substructures. Utility code like that
is difficult to justify, every line is more to maintain.

The other problem is I don't like the fingerprint APIs so it's a toss-up
between using effort to implement something I (or hopefully someone else)
will ultimately rewrite in future. "Deprecated on arrival" I believe Egon
has said before.

On Fri, 21 Feb 2020 at 18:25, John Mayfield 
wrote:

> What do you think the "raw" fingerprint is? Why would you expect it for
> the Substructure one?
>
> On Fri, 21 Feb 2020 at 09:47, Maria Sorokina 
> wrote:
>
>> I tried in total 7 fingerprinters (PubChem, Substructure, MACCS,
>> KlekotaRoth, Circular, ShortestPath and Hybrifization) and none worked. For
>> some, I’m not surprised, but I was really expecting to have the raw
>> fingerprints for the Substructure one
>>
>>
>> Dr. Maria Sorokina
>> Steinbeck Research Group
>> Analytical Chemistry - Cheminformatics and Chemometrics
>> Friedrich-Schiller-University Jena, Germany
>> http://cheminf.uni-jena.de
>>
>> Le 21 févr. 2020 à 10:39, John Mayfield  a
>> écrit :
>>
>> ... I do have some patches for an updated fingerprint API stack that
>> would also add this in to more places. Essentially it was added to the
>> public API but only implemented in a few places and left as a "ToDo"
>> elsewhere. Might be something for the hack-a-thon.
>>
>> I should PubChem fingerprints are binary in nature though so you would
>> probably never want the RAW version. *getBitFingerprint()* it
>> implemented always.
>>
>> John
>>
>> On Fri, 21 Feb 2020 at 09:34, John Mayfield 
>> wrote:
>>
>>> Hi Maria,
>>>
>>> Not all fingerprint support the "RAW" option and Count options.
>>>
>>> John
>>>
>>> On Fri, 21 Feb 2020 at 09:31, Maria Sorokina 
>>> wrote:
>>>
>>>> Dear community,
>>>>
>>>> It is decidedly substructure search and fingerprinting period of the
>>>> year!
>>>>
>>>> I want to create (to store) raw fingerprints of a range of different
>>>> fingerprint types for a big number of complex molecules (natural products).
>>>>
>>>> For example this:
>>>>
>>>> PubchemFingerprinter pubchemFingerprinter = new PubchemFingerprinter( 
>>>> SilentChemObjectBuilder.getInstance() );
>>>>
>>>> System.out.println(pubchemFingerprinter.getRawFingerprint(myAtomContainer));
>>>>
>>>> For all my molecules I am getting an" UnsupportedOperationException",
>>>> which according to the documentation reflects only the fact that the 
>>>> fingerprinter
>>>> cannot produce the raw fingerprint.
>>>> I am using the latest (2.3) version of the CDK.
>>>> Can anybody help me with this issue?
>>>>
>>>>
>>>> Kind regards,
>>>> Maria
>>>>
>>>>
>>>> Dr. Maria Sorokina
>>>> Steinbeck Research Group
>>>> Analytical Chemistry - Cheminformatics and Chemometrics
>>>> Friedrich-Schiller-University Jena, Germany
>>>> http://cheminf.uni-jena.de
>>>>
>>>> ___
>>>> Cdk-user mailing list
>>>> Cdk-user@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>>
>>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Raw fingerprints impossible to calculate

2020-02-21 Thread John Mayfield
What do you think the "raw" fingerprint is? Why would you expect it for the
Substructure one?

On Fri, 21 Feb 2020 at 09:47, Maria Sorokina 
wrote:

> I tried in total 7 fingerprinters (PubChem, Substructure, MACCS,
> KlekotaRoth, Circular, ShortestPath and Hybrifization) and none worked. For
> some, I’m not surprised, but I was really expecting to have the raw
> fingerprints for the Substructure one
>
>
> Dr. Maria Sorokina
> Steinbeck Research Group
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> http://cheminf.uni-jena.de
>
> Le 21 févr. 2020 à 10:39, John Mayfield  a
> écrit :
>
> ... I do have some patches for an updated fingerprint API stack that would
> also add this in to more places. Essentially it was added to the public API
> but only implemented in a few places and left as a "ToDo" elsewhere. Might
> be something for the hack-a-thon.
>
> I should PubChem fingerprints are binary in nature though so you would
> probably never want the RAW version. *getBitFingerprint()* it implemented
> always.
>
> John
>
> On Fri, 21 Feb 2020 at 09:34, John Mayfield 
> wrote:
>
>> Hi Maria,
>>
>> Not all fingerprint support the "RAW" option and Count options.
>>
>> John
>>
>> On Fri, 21 Feb 2020 at 09:31, Maria Sorokina 
>> wrote:
>>
>>> Dear community,
>>>
>>> It is decidedly substructure search and fingerprinting period of the
>>> year!
>>>
>>> I want to create (to store) raw fingerprints of a range of different
>>> fingerprint types for a big number of complex molecules (natural products).
>>>
>>> For example this:
>>>
>>> PubchemFingerprinter pubchemFingerprinter = new PubchemFingerprinter( 
>>> SilentChemObjectBuilder.getInstance() );
>>>
>>> System.out.println(pubchemFingerprinter.getRawFingerprint(myAtomContainer));
>>>
>>> For all my molecules I am getting an" UnsupportedOperationException",
>>> which according to the documentation reflects only the fact that the 
>>> fingerprinter
>>> cannot produce the raw fingerprint.
>>> I am using the latest (2.3) version of the CDK.
>>> Can anybody help me with this issue?
>>>
>>>
>>> Kind regards,
>>> Maria
>>>
>>>
>>> Dr. Maria Sorokina
>>> Steinbeck Research Group
>>> Analytical Chemistry - Cheminformatics and Chemometrics
>>> Friedrich-Schiller-University Jena, Germany
>>> http://cheminf.uni-jena.de
>>>
>>> ___
>>> Cdk-user mailing list
>>> Cdk-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Raw fingerprints impossible to calculate

2020-02-21 Thread John Mayfield
... I do have some patches for an updated fingerprint API stack that would
also add this in to more places. Essentially it was added to the public API
but only implemented in a few places and left as a "ToDo" elsewhere. Might
be something for the hack-a-thon.

I should PubChem fingerprints are binary in nature though so you would
probably never want the RAW version. *getBitFingerprint()* it implemented
always.

John

On Fri, 21 Feb 2020 at 09:34, John Mayfield 
wrote:

> Hi Maria,
>
> Not all fingerprint support the "RAW" option and Count options.
>
> John
>
> On Fri, 21 Feb 2020 at 09:31, Maria Sorokina 
> wrote:
>
>> Dear community,
>>
>> It is decidedly substructure search and fingerprinting period of the year!
>>
>> I want to create (to store) raw fingerprints of a range of different
>> fingerprint types for a big number of complex molecules (natural products).
>>
>> For example this:
>>
>> PubchemFingerprinter pubchemFingerprinter = new PubchemFingerprinter( 
>> SilentChemObjectBuilder.getInstance() );
>>
>> System.out.println(pubchemFingerprinter.getRawFingerprint(myAtomContainer));
>>
>> For all my molecules I am getting an" UnsupportedOperationException",
>> which according to the documentation reflects only the fact that the 
>> fingerprinter
>> cannot produce the raw fingerprint.
>> I am using the latest (2.3) version of the CDK.
>> Can anybody help me with this issue?
>>
>>
>> Kind regards,
>> Maria
>>
>>
>> Dr. Maria Sorokina
>> Steinbeck Research Group
>> Analytical Chemistry - Cheminformatics and Chemometrics
>> Friedrich-Schiller-University Jena, Germany
>> http://cheminf.uni-jena.de
>>
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Raw fingerprints impossible to calculate

2020-02-21 Thread John Mayfield
Hi Maria,

Not all fingerprint support the "RAW" option and Count options.

John

On Fri, 21 Feb 2020 at 09:31, Maria Sorokina 
wrote:

> Dear community,
>
> It is decidedly substructure search and fingerprinting period of the year!
>
> I want to create (to store) raw fingerprints of a range of different
> fingerprint types for a big number of complex molecules (natural products).
>
> For example this:
>
> PubchemFingerprinter pubchemFingerprinter = new PubchemFingerprinter( 
> SilentChemObjectBuilder.getInstance() );
>
> System.out.println(pubchemFingerprinter.getRawFingerprint(myAtomContainer));
>
> For all my molecules I am getting an" UnsupportedOperationException",
> which according to the documentation reflects only the fact that the 
> fingerprinter
> cannot produce the raw fingerprint.
> I am using the latest (2.3) version of the CDK.
> Can anybody help me with this issue?
>
>
> Kind regards,
> Maria
>
>
> Dr. Maria Sorokina
> Steinbeck Research Group
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> http://cheminf.uni-jena.de
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Substructure search using ShortestPathFingerprinter

2020-02-20 Thread John Mayfield
I've added a warning in the doc, there was already a warning on MACCS 166
keys.

https://github.com/cdk/cdk/commit/82cb4f8d49283e117696f40d09538c70790a18fd

On Thu, 20 Feb 2020 at 15:20, John Mayfield 
wrote:

> *wrote :-)
>
> On Thu, 20 Feb 2020 at 15:20, John Mayfield 
> wrote:
>
>> Only *Fingerprinter* or *ExtendedFingerprint* obey this transitivity
>> property.
>>
>> Relevant post I wrong in 2015:
>> https://nextmovesoftware.com/blog/2015/02/16/for-every-fingerprint-optimisation-there-is-an-equal-and-opposite-fingerprint-deterioration/
>>
>> On Thu, 20 Feb 2020 at 10:44, nicepeopleproject <
>> nicepeopleproj...@gmail.com> wrote:
>>
>>> Hello!
>>> I'm trying to realize substructure search. As I understand, the
>>> ShortestPathFingerprinter is suitable for this. I ran into the following
>>> problem. I attach two file(in molecules.zip). when using butane.mol as
>>> query, should find ciclopentane.mol. When i found BitSet for butane i got:
>>> {115, 503, 540, 653, 893}
>>> {115, 503, 542, 653, 893} - for ciclopentane.
>>> So i cannot find ciclopentane. Is there a way to make it work?
>>>
>>> --
>>> С уважением,
>>> Николаев Артём
>>> ___
>>> Cdk-user mailing list
>>> Cdk-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Substructure search using ShortestPathFingerprinter

2020-02-20 Thread John Mayfield
Only *Fingerprinter* or *ExtendedFingerprint* obey this transitivity
property.

Relevant post I wrong in 2015:
https://nextmovesoftware.com/blog/2015/02/16/for-every-fingerprint-optimisation-there-is-an-equal-and-opposite-fingerprint-deterioration/

On Thu, 20 Feb 2020 at 10:44, nicepeopleproject 
wrote:

> Hello!
> I'm trying to realize substructure search. As I understand, the
> ShortestPathFingerprinter is suitable for this. I ran into the following
> problem. I attach two file(in molecules.zip). when using butane.mol as
> query, should find ciclopentane.mol. When i found BitSet for butane i got:
> {115, 503, 540, 653, 893}
> {115, 503, 542, 653, 893} - for ciclopentane.
> So i cannot find ciclopentane. Is there a way to make it work?
>
> --
> С уважением,
> Николаев Артём
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] How can I use CDK on ia64 server using HP UX OS?

2019-12-02 Thread John Mayfield
Hi,

Providing you don't try and get an InChI from OPSIN/CDK then everything
else will work. Do you really need an InChI?

If you really do need it then I think the best option would be to build the
InChI library/executable yourself and then call out to it via system exec
via a Molfile:

 Runtime.getRuntime().exec("inchi input.mol"); // etc


You could also rebuild JNI-InChI but this is more complicated.

John

On Sun, 1 Dec 2019 at 12:55, 강신원  wrote:

> Hi, all.
>
> I'm making a simple chemical substructure using CDK and OPSIN library.
>
> I recently got to know that my program should run on ia64 server using HP
> UX OS, but jni-inchi in the CDK and OPSIN does not support that platform.
>
> Is there any one who have binary jni-inchi library for that platform or
> know how to get it?
>
> Help, please.
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Questions about the function "placeSpiroRing" in RingPlacer.jave

2019-11-30 Thread John Mayfield
I already explained, you can have "spirodegree > 2" (i.e. degree > 4) in
which case it lays things out on top of each other (incorrect). The num
place two "nudge" is to make the bond lengths longer.

Try commenting out the if conditions and use the SMILES I gave last time to
see the effect:

C1CO[Fe]234(O1)OCCO2.C(CO3)O4



On Sat, 30 Nov 2019 at 09:30, 努力努力 <843982...@qq.com> wrote:

> Why do we have special treatment when degree==4 and numplace==2?
> In my understanding, "degree" is the number of bonds connected to
> sharedAtoms, and numPlaced is the number of other Atoms ring except
> sharedAtoms.
> Looking forward to your reply. Thank you!
>
> The source code from CDK is here:
> public void placeSpiroRing(IRing ring, IAtomContainer sharedAtoms, Point2d
> sharedAtomsCenter, Vector2d ringCenterVector, double bondLength) {
>
> IAtom startAtom = sharedAtoms.getAtom(0);
> List mBonds =
> molecule.getConnectedBondsList(sharedAtoms.getAtom(0));
> final int degree = mBonds.size();
> logger.debug("placeSpiroRing: D=", degree);
>
> // recalculate the ringCentreVector
> if (degree != 4) {
>
> int numPlaced = 0;
> for (IBond bond : mBonds) {
> IAtom nbr = bond.getOther(sharedAtoms.getAtom(0));
> if (!nbr.getFlag(CDKConstants.ISPLACED))
> continue;
> numPlaced++;
> }
>
> if (numPlaced == 2) {
> // nudge the shared atom such that bond lengths will be
> // equal
> startAtom.getPoint2d().add(ringCenterVector);
> sharedAtomsCenter.add(ringCenterVector);
> }
>
> double theta = Math.PI-(2 * Math.PI / (degree / 2));
> rotate(ringCenterVector, theta);
> }
>
> double radius = getNativeRingRadius(ring, bondLength);
> Point2d ringCenter = new Point2d(sharedAtomsCenter);
> if (degree == 4) {
> ringCenterVector.normalize();
> ringCenterVector.scale(radius);
> } else {
> // spread things out a little for multiple spiro centres
> ringCenterVector.normalize();
> ringCenterVector.scale(2*radius);
> }
> ringCenter.add(ringCenterVector);
> double addAngle = 2 * Math.PI / ring.getRingSize();
>
> IAtom currentAtom = startAtom;
> double startAngle = GeometryUtil.getAngle(startAtom.getPoint2d().x
> - ringCenter.x,
>   startAtom.getPoint2d().y
> - ringCenter.y);
>
> /*
>  * Get one bond connected to the spiro bridge atom. It doesn't
> matter in
>  * which direction we draw.
>  */
> List rBonds = ring.getConnectedBondsList(startAtom);
>
> IBond currentBond = (IBond) rBonds.get(0);
>
> Vector atomsToDraw = new Vector();
> /*
>  * Store all atoms to draw in consequtive order relative to the
> chosen
>  * bond.
>  */
> for (int i = 0; i < ring.getBondCount(); i++) {
> currentBond = ring.getNextBond(currentBond, currentAtom);
> currentAtom = currentBond.getOther(currentAtom);
> if (!currentAtom.equals(startAtom))
> atomsToDraw.addElement(currentAtom);
> }
> logger.debug("currentAtom  " + currentAtom);
> logger.debug("startAtom  " + startAtom);
>
> atomPlacer.populatePolygonCorners(atomsToDraw, ringCenter,
> startAngle, addAngle, radius);
>
> }
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Questions about the function "Addring"

2019-11-20 Thread John Mayfield
It's calculated differently because without it the rings get laid out on
top of each other. Example case, you can reverse this commit:

https://github.com/cdk/cdk/commit/6533533a95b5e9ca0d55d0d37ab5f048a25e88f7#diff-da65f1759b150e9510a643e017112b3f

And see how it lays out the following.

C1CO[Fe]234(O1)OCCO2.C(CO3)O4

[image: image.png]

The old code would generate this:

[image: image.png]

because the bond vector was pointing towards the centre of the ring.

John

On Wed, 20 Nov 2019 at 16:36, Christoph Steinbeck <
christoph.steinb...@uni-jena.de> wrote:

> This is very old code and it seem that others changed it (to the better
> :)) since I wrote it long time ago.
> If I understand you correctly, there is actually no bug, just an apparent
> inconsistency that you are reporting.
> The simplest thing would be for you to remove the case distinction and see
> what happens.
> Maybe that reveals the reason for the distinction.
>
> I’d love to dig into this but I lack the time for such fun these days
> Very sad. :D
>
> All the best,
>
> Chris
>
> —
> Prof. Dr. Christoph Steinbeck
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> Phone Secretariat: +49-3641-948171
> http://cheminf.uni-jena.de
> http://orcid.org/-0001-6966-0814
>
> What is man but that lofty spirit - that sense of enterprise.
> ... Kirk, "I, Mudd," stardate 4513.3..
>
> > On 18. Nov 2019, at 16:24, 努力努力 <843982...@qq.com> wrote:
> >
> > Thanks for your reply.
> > I want to implement a JChemPaint in C#. In fact, I used a CDK port,
> named NCDK which is C# impementation implementation of the Chemistry
> Development Kit.  https://github.com/kazuyaujihara/NCDK.
> > But I have got some bugs. I want to read the source code of CDK and fix
> it.
> > The bug is like this:
> > When we addRing, we need to calculate the position of the virtual ring.
> From the code , I understand that the position of the new ring need to be
> calculated by some variables, include the new ring center, startAngle,
> addAngle, radius.
> > The code in function "placeSpiroRing" is
> > atomPlacer.populatePolygonCorners(atomsToDraw, ringCenter, startAngle,
> addAngle, radius);
> > The variable ringCenter is dependent on the variable ringCenterVector
> which is a vector pointing the the center of the new ring.
> >
> > For example, when I want to add a triangle to a shared atom, It
> satisfies numplace==2.It seems that the sharedAtom' s position will be
> changed.
> > The code in function "placeSpiroRing" is
> > if (numPlaced == 2) {
> > // nudge the shared atom such that bond lengths will be
> > // equal
> > startAtom.getPoint2d().add(ringCenterVector);
> > sharedAtomsCenter.add(ringCenterVector);
> > }
> > And when degree == 4 and degree != 4, ringCenterVector is differently
> recalculate. Why?
> > The code in function "placeSpiroRing" is
> > if (degree == 4) {
> > ringCenterVector.normalize();
> > ringCenterVector.scale(radius);
> > } else {
> > // spread things out a little for multiple spiro centres
> > ringCenterVector.normalize();
> > ringCenterVector.scale(2*radius);
> > }
> > I'm confused. Or I understand it wrong.
> >
> > Thank you for taking your time to read this letter again.
> >
> > -- 原始邮件 --
> > 发件人: "Christoph Steinbeck";
> > 发送时间: 2019年11月18日(星期一) 晚上6:55
> > 收件人: "努力努力"<843982...@qq.com>;
> > 抄送: "cdk-user";
> > 主题: Re: [Cdk-user] Questions about the function "Addring"
> >
> > Can you comment on what you try to achieve?
> > The method that you are referring to is a quite specialised method for
> structure diagram layout.
> > Are you trying to create 2D drawings of some molecule or fragment, or
> maybe something else?
> >
> > Kind regards,   Chris
> >
> > —
> > Prof. Dr. Christoph Steinbeck
> > Analytical Chemistry - Cheminformatics and Chemometrics
> > Friedrich-Schiller-University Jena, Germany
> > Phone Secretariat: +49-3641-948171
> > http://cheminf.uni-jena.de
> > http://orcid.org/-0001-6966-0814
> >
> > What is man but that lofty spirit - that sense of enterprise.
> > ... Kirk, "I, Mudd," stardate 4513.3..
> >
> > > On 18. Nov 2019, at 11:32, 努力努力 <843982...@qq.com> wrote:
> > >
> > > Dear all,
> > > i want to understand how to add rings in the atom.In the function
> "Addring",I find the code "ringPlacer.PlaceSpiroRing" and then jump to the
> function "placeSpiroRing".And I have some problems about this function.Why
> do we have special treatment when degree==4 and numplace==2? In my
> understanding, "degree" is the number of bonds connected to sharedAtoms,
> and numPlaced is the number of other Atoms ring except sharedAtoms.
> > > Looking forward to your reply. Thank you!
> > >
> > > The source code from CDK is here:
> > > public void placeSpiroRing(IRing ring, IAtomContainer sharedAtoms,
> Point2d 

Re: [Cdk-user] Reg: Reading Jmol generated SDF file using Iterating SDF reader cdk 1.5.8

2019-11-17 Thread John Mayfield
IIRC JMol was generating them incorrectly, Bob (JMol dev) patched it and we
also updated our code to be more tolerant. Please try CDK 2.3 and if there
is still an issue report via GitHub Issues.

On Sun, 17 Nov 2019 at 09:50, Vinothkumar Mohanakrishnan 
wrote:

> Dear Users,
>
> I would like to read a SDF file using IteratingSDFReader. The SDF file is 
> generated by jmol (see below).
>
> U:/research/project_opas/Code/MVC_OPAS/build/check.sdf
> __Jmol-14_11161922023D 1   1.0 0.0 0
> Jmol version 14.9.1  2017-02-18 13:47 EXTRACT: ({0:43})
>  22 22  0  0  0  0  1 V2000
>  -10.26842  21.30587  -2.02430 N   0  0  0  0  0  0
>  -11.14109  20.23272  -2.39821 C   0  0  0  0  0  0
>  -10.84018  18.85777  -1.92734 C   0  0  0  0  0  0
>  -11.60011  17.94142  -2.23013 O   0  0  0  0  0  0
>  -12.38167  20.49262  -3.21797 C   0  0  0  0  0  0
>  -12.66761  21.93940  -3.49472 C   0  0  0  0  0  0
>  -13.73838  22.56509  -3.02287 C   0  0  0  0  0  0
>  -11.88128  22.81522  -4.30956 N   0  0  0  0  0  0
>  -12.65747  24.01784  -4.21934 C   0  0  0  0  0  0
>  -13.70560  23.88527  -3.49217 N   0  0  0  0  0  0
>   -9.64144  18.61510  -1.15438 N   0  0  0  0  0  0
>   -6.47461  15.46647   0.53840 C   0  0  0  0  0  0
>   -5.16207  15.52031  -0.21484 C   0  0  0  0  0  0
>   -4.06313  15.41673   0.72449 N   0  0  0  0  0  0
>   -9.80464  17.44463  -0.29800 C   0  0  0  0  0  0
>   -8.64600  16.48725  -0.41087 C   0  0  0  0  0  0
>   -8.90795  15.33022  -1.01563 C   0  0  0  0  0  0
>   -7.33993  16.68411   0.28862 C   0  0  0  0  0  0
>   -7.04245  17.86578   0.93854 O   0  0  0  0  0  0
>   -6.07225  18.51085   0.19546 C   0  0  0  0  0  0
>   -6.39116  19.21376  -0.74531 O   0  0  0  0  0  0
>   -4.69541  18.41611   0.56612 N   0  0  0  0  0  0
>   2  1  1  0  0  0
>   2  3  1  0  0  0
>   3 11  1  0  0  0
>   4  3  2  0  0  0
>   5  2  1  0  0  0
>   6  5  1  0  0  0
>   6  7  2  0  0  0
>   8  9  1  0  0  0
>   8  6  1  0  0  0
>   9 10  2  0  0  0
>  10  7  1  0  0  0
>  13 12  1  0  0  0
>  13 14  1  0  0  0
>  15 16  1  0  0  0
>  16 17  2  0  0  0
>  16 18  1  0  0  0
>  18 19  1  0  0  0
>  19 20  1  0  0  0
>  20 21  2  0  0  0
>  20 22  1  0  0  0
>  11 15  1  0  0  0
>  12 18  1  0  0  0
> M  END
> 
> U:/research/project_opas/Code/MVC_OPAS/build/check.sdf
> __Jmol-14_11161922023D 1   1.0 0.0 0
> Jmol version 14.9.1  2017-02-18 13:47 EXTRACT: ({0:43})
>  22 22  0  0  0  0  1 V2000
>  -10.34994  21.23775  -2.14993 N   0  0  0  0  0  0
>  -11.34250  20.30777  -2.59501 C   0  0  0  0  0  0
>  -11.13198  18.85018  -2.50356 C   0  0  0  0  0  0
>  -12.03183  18.12532  -2.90109 O   0  0  0  0  0  0
>  -12.62336  20.72709  -3.26233 C   0  0  0  0  0  0
>  -12.81469  22.18660  -3.49445 C   0  0  0  0  0  0
>  -13.85760  22.84610  -3.01911 C   0  0  0  0  0  0
>  -11.99766  23.01098  -4.32672 N   0  0  0  0  0  0
>  -12.72729  24.24473  -4.23861 C   0  0  0  0  0  0
>  -13.77078  24.16132  -3.50354 N   0  0  0  0  0  0
>   -9.86955  18.26965  -2.15583 N   0  0  0  0  0  0
>   -7.30122  16.52601   0.75455 C   0  0  0  0  0  0
>   -5.96259  17.25681   0.80966 C   0  0  0  0  0  0
>   -4.87734  16.30636   0.66950 N   0  0  0  0  0  0
>  -12.83689  15.52634  -0.85527 C   0  0  0  0  0  0
>  -11.62686  16.04439  -1.46833 N   0  0  0  0  0  0
>  -11.00919  14.97258  -2.22952 C   0  0  0  0  0  0
>  -10.70605  16.47306  -0.42756 C   0  0  0  0  0  0
>   -9.83868  17.64207  -0.85262 C   0  0  0  0  0  0
>   -8.72671  18.07421   0.02162 C   0  0  0  0  0  0
>   -8.09045  19.06942  -0.28779 O   0  0  0  0  0  0
>   -8.35385  17.36976   1.15010 O   0  0  0  0  0  0
>   2  1  1  0  0  0
>   2  3  1  0  0  0
>   3 11  1  0  0  0
>   4  3  2  0  0  0
>   5  2  1  0  0  0
>   6  5  1  0  0  0
>   6  7  2  0  0  0
>   8  9  1  0  0  0
>   8  6  1  0  0  0
>   9 10  2  0  0  0
>  10  7  1  0  0  0
>  13 12  1  0  0  0
>  13 14  1  0  0  0
>  15 16  1  0  0  0
>  16 17  1  0  0  0
>  16 18  1  0  0  0
>  18 19  1  0  0  0
>  19 20  1  0  0  0
>  20 21  2  0  0  0
>  20 22  1  0  0  0
>  11 19  1  0  0  0
>  12 22  1  0  0  0
> M  END
> 
>
> I am using CDK 1.5.8 (I have to stick to this version for compatibility 
> issues). I am trying to read the SDF file using the below snippet
>
> public static List readFragments(String fileName) throws 
> IOException, CDKException {
>
> List frags = new ArrayList<>();
>
> File sdfFile = new File(fileName);
>   
>IteratingSDFReader sdfReader = new IteratingSDFReader(new 
> FileInputStream(sdfFile),DefaultChemObjectBuilder.getInstance());
>
> while (sdfReader.hasNext()) {
>
> IAtomContainer molecule = 
> (IAtomContainer)sdfReader.next();
>
> frags.add(molecule);
>
> }
> sdfReader.close();
>
> return frags;
> }
>
> The function works perfectly fine for sdf files genrated by CDK and Openbabel 
> and returns null for Jmol generated 

Re: [Cdk-user] Smarts cast exception

2019-11-08 Thread John Mayfield
No problem,

So essentially there are "molecules" and  "queries". Molecules are things,
queries match things. We can convert a molecule to a query by telling it
what things we want to match.

John

On Fri, 8 Nov 2019 at 09:27, Stesycki, Manuel 
wrote:

> Ok i looked up the Test class and did the following:
>
> public static String createSMARTS(AtomContainer ac) {
>
> String ret;
>
> try {
>
> QueryAtomContainer qac = QueryAtomContainer.create(ac,
> Expr.Type.ALIPHATIC_ELEMENT,
> Expr.Type.AROMATIC_ELEMENT,
> Expr.Type.SINGLE_OR_AROMATIC,
> Expr.Type.ALIPHATIC_ORDER,
> Expr.Type.ISOTOPE,
> Expr.Type.RING_BOND_COUNT
> );
>
> ret = Smarts.generate(qac);
>
> } catch (Exception e) {
> ret = "";
> }
>
> return ret;
> }
>
> This works for me.
>
> Sorry to bother you,
>Manuel Stesycki
>
> IT
>0208 / 306-2146
>Physikbau, Büro 117
>stesy...@mpi-muelheim.mpg.de
>
> Max-Planck-Institut für Kohlenforschung
>Kaiser-Wilhelm-Platz 1
>D-45470 Mülheim an der Ruhr
>http://www.kofo.mpg.de/de
>
> Am 07.11.2019 um 14:34 schrieb Stesycki, Manuel <
> stesy...@mpi-muelheim.mpg.de>:
>
> Dear all,
>
> i am trying to create SMARTS-Pattern for a structure.
> The structure is an AtomContainer and read from an MDL file.
> If i try to call Smarts.generate( AtomContainer )
> i get this error message:
>
> java.lang.ClassCastException: org.openscience.cdk.Bond cannot be cast to
> org.openscience.cdk.isomorphism.matchers.QueryBond
>
> Has anyone a clue, where my mistake is?
>
> Many thanks,
>Manuel Stesycki
>
> IT
>0208 / 306-2146
>Physikbau, Büro 117
>stesy...@mpi-muelheim.mpg.de
>
> Max-Planck-Institut für Kohlenforschung
>Kaiser-Wilhelm-Platz 1
>D-45470 Mülheim an der Ruhr
>http://www.kofo.mpg.de/de
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Bug in MCS determination?

2019-08-28 Thread John Mayfield
Which SMSD are you using? I don't have control over the downstream one.

On Wed, 28 Aug 2019 at 10:16, Tim Dudgeon  wrote:

> Unfortunately other parts of my code are using new features such as
> IAtomContainer.atoms() so whilst switching to the legacy IAtomContainer
> avoids the alignment problem it looks to be a no go as a solution.
>
> Would switching to the legacy classes in the org.openscience.cdk.smsd
> package be an option or do I just need to wait for the problem to be fixed?
>
>
> On 28/08/2019 08:15, John Mayfield wrote:
>
> Okay code is likely adding atoms/bonds in the wrong order, will fix it.
>
> On Tue, 27 Aug 2019 at 18:18, Tim Dudgeon  wrote:
>
>> Hi John,
>>
>> Yes, turning off AtomContainer2 avoids the error.
>>
>>
>> On 27/08/2019 16:31, John Mayfield wrote:
>>
>> Hmm odd, in legacy so expected but tests seem okay. Can you try turning
>> off AtomContainer2, https://github.com/cdk/cdk/wiki/AtomContainer2
>>
>> On Tue, 27 Aug 2019 at 14:19, Tim Dudgeon  wrote:
>>
>>> Hi folks,
>>>
>>> I'm getting a NPE from AtomAtomMapping.getCommonFragmentAsSMILES() in
>>> certain cases.
>>> An example is below - the two structures differ only for a Cl <-> Br
>>> change.
>>>
>>> This is using the org.openscience.smsd.AtomAtomMapping,
>>> org.openscience.smsd.Isomorphism and
>>> org.openscience.smsd.tools.ExtAtomContainerManipulator classes.
>>> Not sure if those guys are active on this list?
>>>
>>>
>>> IAtomContainer query =  smilesParser.parseSmiles('BrC1CCC(Cc2c2)C1')
>>> IAtomContainer target = smilesParser.parseSmiles('ClC1CCC(Cc2c2)C1')
>>>
>>> StructureDiagramGenerator sdg = new StructureDiagramGenerator()
>>> sdg.generateCoordinates(query)
>>>
>>> ExtAtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(query)
>>> ExtAtomContainerManipulator.aromatizeMolecule(query)
>>>
>>> ExtAtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(target)
>>> ExtAtomContainerManipulator.aromatizeMolecule(target)
>>>
>>> Isomorphism comparison = new Isomorphism(query, target, Algorithm.DEFAULT, 
>>> true, false, false)
>>> AtomAtomMapping mapping = comparison.getFirstAtomMapping()
>>> String mcsSmiles = mapping.getCommonFragmentAsSMILES()
>>>
>>> The error I get is:
>>>
>>> java.lang.NullPointerException
>>> at
>>> org.openscience.cdk.silent.AtomContainer2.getAtomRefUnsafe(AtomContainer2.java:172)
>>> at
>>> org.openscience.cdk.silent.AtomContainer2.getBond(AtomContainer2.java:612)
>>> at
>>> org.openscience.smsd.AtomAtomMapping.getCommonFragment(AtomAtomMapping.java:332)
>>> at
>>> org.openscience.smsd.AtomAtomMapping.getCommonFragmentAsSMILES(AtomAtomMapping.java:371)
>>> at
>>> org.squonk.fragnet.depict.ChemUtilsSpec.alignMolecule2(ChemUtilsSpec.groovy:81)
>>>
>>>
>>>
>>>
>>> ___
>>> Cdk-user mailing list
>>> Cdk-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Bug in MCS determination?

2019-08-28 Thread John Mayfield
Okay code is likely adding atoms/bonds in the wrong order, will fix it.

On Tue, 27 Aug 2019 at 18:18, Tim Dudgeon  wrote:

> Hi John,
>
> Yes, turning off AtomContainer2 avoids the error.
>
>
> On 27/08/2019 16:31, John Mayfield wrote:
>
> Hmm odd, in legacy so expected but tests seem okay. Can you try turning
> off AtomContainer2, https://github.com/cdk/cdk/wiki/AtomContainer2
>
> On Tue, 27 Aug 2019 at 14:19, Tim Dudgeon  wrote:
>
>> Hi folks,
>>
>> I'm getting a NPE from AtomAtomMapping.getCommonFragmentAsSMILES() in
>> certain cases.
>> An example is below - the two structures differ only for a Cl <-> Br
>> change.
>>
>> This is using the org.openscience.smsd.AtomAtomMapping,
>> org.openscience.smsd.Isomorphism and
>> org.openscience.smsd.tools.ExtAtomContainerManipulator classes.
>> Not sure if those guys are active on this list?
>>
>>
>> IAtomContainer query =  smilesParser.parseSmiles('BrC1CCC(Cc2c2)C1')
>> IAtomContainer target = smilesParser.parseSmiles('ClC1CCC(Cc2c2)C1')
>>
>> StructureDiagramGenerator sdg = new StructureDiagramGenerator()
>> sdg.generateCoordinates(query)
>>
>> ExtAtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(query)
>> ExtAtomContainerManipulator.aromatizeMolecule(query)
>>
>> ExtAtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(target)
>> ExtAtomContainerManipulator.aromatizeMolecule(target)
>>
>> Isomorphism comparison = new Isomorphism(query, target, Algorithm.DEFAULT, 
>> true, false, false)
>> AtomAtomMapping mapping = comparison.getFirstAtomMapping()
>> String mcsSmiles = mapping.getCommonFragmentAsSMILES()
>>
>> The error I get is:
>>
>> java.lang.NullPointerException
>> at
>> org.openscience.cdk.silent.AtomContainer2.getAtomRefUnsafe(AtomContainer2.java:172)
>> at
>> org.openscience.cdk.silent.AtomContainer2.getBond(AtomContainer2.java:612)
>> at
>> org.openscience.smsd.AtomAtomMapping.getCommonFragment(AtomAtomMapping.java:332)
>> at
>> org.openscience.smsd.AtomAtomMapping.getCommonFragmentAsSMILES(AtomAtomMapping.java:371)
>> at
>> org.squonk.fragnet.depict.ChemUtilsSpec.alignMolecule2(ChemUtilsSpec.groovy:81)
>>
>>
>>
>>
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] depiction without stereochemistry

2019-08-21 Thread John Mayfield
Technically 2.3 isn't officially release yet as not had time to do the
patch notes ;-), but feel free to use it.

You can just remove the stereo information? The wedge/hatch info is display
only, the actual information is stored as a list of stereo elements. You
can clear this:

> mol.setStereoElements(new ArrayList());


Again if you've already generated a layout when stereo is there you will
need to clear the bond display. This is actually one of the reasons I added
the "bond display" to emphasise it really is only a display option and the
stereo info isn't stored there.

On Wed, 21 Aug 2019 at 14:51, Tim Dudgeon  wrote:

> I'm using the 2.3 release.
>
> Using bond.setDisplay(IBond.Display.Solid) does work, but there's a big
> gotcha. If the molecule still needs to be layed out (e.g. no 2D
> coordinates) then when the DepictionGenerator does the layout the bond's
> display property gets reset to the chiral representation.
>
> The workaround is to make sure that 2D coordinates are present (e.g. using
> StructureDiagramGenerator) and then set the display property.
>
> Tim
>
>
> On 20/08/2019 17:31, John Mayfield wrote:
>
> Are you using the very latest release? I think it's an over site. Try
>
> bond.setDisplay(IBond.Display.Solid);
>
>
> On Tue, 20 Aug 2019 at 16:40, Tim Dudgeon  wrote:
>
>> Hi, I'm wanting to depict a molecule without sterochemistry, but the
>> DepictionGenerator stubbornly seems to add it back.
>> What is the way to do this? In the example below a squiggle bond is
>> displayed even though all bonds have been set to Stereo.NONE and I have
>> already layed out the molecule.
>>
>> void "no stereo"() {
>>
>> IAtomContainer mol = ChemUtils.readSmiles("c1ccc(N=C2SCCN2c2c2)cc1")
>> StructureDiagramGenerator sdg = new StructureDiagramGenerator();
>> sdg.setMolecule(mol);
>> sdg.generateCoordinates(new Vector2d(0, 1));
>> mol = sdg.getMolecule();
>> for (IBond bond : mol.bonds()) {
>> bond.setStereo(IBond.Stereo.NONE);
>> }
>> DepictionGenerator g = new DepictionGenerator()
>> Depiction d = g.depict(mol)
>>
>> when:
>> def img = d.toImg()
>> ByteArrayOutputStream out = new ByteArrayOutputStream();
>> ImageIO.write(img, "png", out);
>> out.close();
>> byte[] png = out.toByteArray();
>> Files.write(java.nio.file.Paths.get("/tmp/myimage5.png"), png)
>>
>> then:
>> png != nullpng.length > 0}
>>
>>
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] depiction without stereochemistry

2019-08-20 Thread John Mayfield
Are you using the very latest release? I think it's an over site. Try

bond.setDisplay(IBond.Display.Solid);


On Tue, 20 Aug 2019 at 16:40, Tim Dudgeon  wrote:

> Hi, I'm wanting to depict a molecule without sterochemistry, but the
> DepictionGenerator stubbornly seems to add it back.
> What is the way to do this? In the example below a squiggle bond is
> displayed even though all bonds have been set to Stereo.NONE and I have
> already layed out the molecule.
>
> void "no stereo"() {
>
> IAtomContainer mol = ChemUtils.readSmiles("c1ccc(N=C2SCCN2c2c2)cc1")
> StructureDiagramGenerator sdg = new StructureDiagramGenerator();
> sdg.setMolecule(mol);
> sdg.generateCoordinates(new Vector2d(0, 1));
> mol = sdg.getMolecule();
> for (IBond bond : mol.bonds()) {
> bond.setStereo(IBond.Stereo.NONE);
> }
> DepictionGenerator g = new DepictionGenerator()
> Depiction d = g.depict(mol)
>
> when:
> def img = d.toImg()
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> ImageIO.write(img, "png", out);
> out.close();
> byte[] png = out.toByteArray();
> Files.write(java.nio.file.Paths.get("/tmp/myimage5.png"), png)
>
> then:
> png != nullpng.length > 0}
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] MCS and alignment

2019-08-15 Thread John Mayfield
In that case include the cdk-legacy module and use that version of SMSD.

Here's the GIST I previously wrote to align to a subgraph:

https://gist.github.com/johnmay/12797a89f4186bc7da881f1f4a706671

On Wed, 14 Aug 2019 at 18:21, Tim Dudgeon  wrote:

> Hi John,
>
> Thanks for that info. I did look into SMSD, but found some problems using
> it with the latest CDK [1,2].
>
> Also, the maven version has not been updated since Jun 2016 so I wonder if
> its still active?
>
> Let me know if you want help with the utility function you mention. Happy
> to help, but not sure right now how to approach it.
>
> CDK rendering is so beautiful!
>
> Tim
>
> 1. https://github.com/asad/SMSD/issues/9
> 2. https://github.com/asad/SMSD/issues/10
>
>
> On 14/08/2019 16:05, John Mayfield wrote:
>
> 2. Use SMSD or Edmund Duesbury's MCS code. SMSD is a separate library now
> as we couldn't smoothly integrate the updates and had tests failing.
>
> 3. You can fix atoms in place with the *Set afix* option of the
> layout. So you copy the coords from MCS you got, fix these in place whilst
> you lay out the rest.
>
> One day I will get around to adding a utility function for this but there
> is some example code on the mailing list for No. 3, look for emails from
> someone at Dotmatics albeit with a substructure match.
>
> On Tue, 13 Aug 2019 at 17:01, Tim Dudgeon  wrote:
>
>> I'm wanting to depict molecules that have been aligned to a the MCS of a
>> query molecule, and highlight the MCS.
>> Are there any examples of this? Seems like some of the relevant CDK code
>> is deprecated, but its not clear what should be used.
>>
>> As an example:
>>
>> 1. I have a query molecule and a target molecule and they share
>> significant MCS, often with the query being a complete subgraph of the
>> target.
>>
>> 2. I identify the MCS
>>
>> 3. Using that MCS I layout the target molecule (e.g. generate 2D
>> coordinates) fixing the parts that are in the MCS to the coordinates
>> from the query structure.
>>
>> 4. I then depict that layed out molecule colouring the MCS.
>>
>> I know how to do #4 - its steps #2 and #3 that I'm unsure about.
>>
>>
>>
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] controlling hydrogen display with DepictionGenerator

2019-08-02 Thread John Mayfield
I think you probably want this:

new DepictionGenerator().withParam(StandardGenerator.Visibility.class,
   new SymbolVisibility() {
   @Override
   public boolean visible(IAtom
atom, List neighbors, RendererModel model) {
   return atom.getAtomicNumber() != 6;
   }
   });


On Fri, 2 Aug 2019 at 16:26, Tim Dudgeon  wrote:

> On 02/08/2019 15:42, John Mayfield wrote:
>
> 1. explicit hydrogens are also not rendered
>
>
> Er... I showed the example with the explicit hydrogens being rendered.
> Let's clarify, what does "explicit hydrogen" mean to you?
>
> Sorry, my mistake. The explicit H is displayed. I was getting mixed up
> with too many examples!
>
>
>
>
>> 2. Carbon symbols are displayed (as if the withCarbonSymbols() method had
>> been called).
>
> The "Visibility" option controls this, but the other answer is just don't
> set to zero on carbons?
>
> Is there an example of how to use this "Visibility" option? Doesn't seem
> to be an option of DepictionGenerator.
>
> Not setting to zero on carbons is not so straight forward as you would
> still want this on terminal carbons.
> I suppose these would be carbons with only one bond to a non-hydrogen
> atom. Does this look right?
>
> for (IAtom atom : mol.atoms()) {
> if (atom.getAtomicNumber() == 6) {
> // count the number of connections that are heavy atomsint 
> numHeavy = 0;
> for (IBond bond : atom.bonds()) {
> IAtom other = bond.getOther(atom);
> if (other.getAtomicNumber() > 1) {
> numHeavy++;
> }
> }
> // if only one then this is a terminal carbon so we need to leave the 
> Hs in placeif (numHeavy < 2) {
> atom.setImplicitHydrogenCount(0);
> }
> } else { // non-carbon atoms    atom.setImplicitHydrogenCount(0);
> }
> }
>
> Tim
>
>
> On Fri, 2 Aug 2019 at 12:09, Egon Willighagen 
> wrote:
>
>>
>>
>> On Fri, Aug 2, 2019 at 11:28 AM John Mayfield <
>> john.wilkinson...@gmail.com> wrote:
>>
>>> Other option You can also use the old *BasicAtomGenerator* which never
>>> puts hydrogens on anything...
>>>
>>
>> Documentation on the old generator stack can be found in this copy of the
>> Groovy CDK book, "Depiction" chapter:
>>
>>
>> https://figshare.com/articles/Edition_1_4_1_0_of_Groovy_Cheminformatics_with_the_Chemistry_Development_Kit/2057790
>>
>> Egon
>>
>> --
>> Hi, do you like citation networks? Already 51% of all citations are
>> available <https://i4oc.org/> available for innovative new uses
>> <https://twitter.com/hashtag/acs2ioc>. Join me in asking the American
>> Chemical Society to join the Initiative for Open Citations too
>> <https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations>.
>>  SpringerNature,
>> the RSC and many others already did <https://i4oc.org/#publishers>.
>>
>> -
>> E.L. Willighagen
>> Department of Bioinformatics - BiGCaT
>> Maastricht University (http://www.bigcat.unimaas.nl/)
>> Homepage: http://egonw.github.com/
>> Blog: http://chem-bla-ics.blogspot.com/
>> PubList: https://www.zotero.org/egonw
>> ORCID: -0001-7542-0286 <http://orcid.org/-0001-7542-0286>
>> ImpactStory: https://impactstory.org/u/egonwillighagen
>>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] controlling hydrogen display with DepictionGenerator

2019-08-02 Thread John Mayfield
>
> 1. explicit hydrogens are also not rendered


Er... I showed the example with the explicit hydrogens being rendered.
Let's clarify, what does "explicit hydrogen" mean to you?


> 2. Carbon symbols are displayed (as if the withCarbonSymbols() method had
> been called).

The "Visibility" option controls this, but the other answer is just don't
set to zero on carbons?

On Fri, 2 Aug 2019 at 12:09, Egon Willighagen 
wrote:

>
>
> On Fri, Aug 2, 2019 at 11:28 AM John Mayfield 
> wrote:
>
>> Other option You can also use the old *BasicAtomGenerator* which never
>> puts hydrogens on anything...
>>
>
> Documentation on the old generator stack can be found in this copy of the
> Groovy CDK book, "Depiction" chapter:
>
>
> https://figshare.com/articles/Edition_1_4_1_0_of_Groovy_Cheminformatics_with_the_Chemistry_Development_Kit/2057790
>
> Egon
>
> --
> Hi, do you like citation networks? Already 51% of all citations are
> available <https://i4oc.org/> available for innovative new uses
> <https://twitter.com/hashtag/acs2ioc>. Join me in asking the American
> Chemical Society to join the Initiative for Open Citations too
> <https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations>.
>  SpringerNature,
> the RSC and many others already did <https://i4oc.org/#publishers>.
>
> -
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: https://www.zotero.org/egonw
> ORCID: -0001-7542-0286 <http://orcid.org/-0001-7542-0286>
> ImpactStory: https://impactstory.org/u/egonwillighagen
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] controlling hydrogen display with DepictionGenerator

2019-08-02 Thread John Mayfield
It's not an option - and I would hesitant to add it as it's "at best
ambiguous" (quoting Brecher's IUPAC :-)). Symyx/Accelrys/BioVia Draw like
to hide the hydrogens by default which is where I think the acceptability
crept in from. You right that for queries there's a use-case, but the
depiction isn't really set up for rendering queries.

Anyways if you want to do it just add a helper routine to sets the implicit
hydrogen counts to 0, for non aromatics you can put them

C1C[N]CN([H])C1


[image: image.png]
The radicals there are specific to the WebApp so you would just get a plain
N - I may actually make than option as I don't like it (Noel Talked me into
it).

Other option You can also use the old *BasicAtomGenerator* which never puts
hydrogens on anything...

John

On Thu, 1 Aug 2019 at 18:12, Tim Dudgeon  wrote:

> Can someone point to examples of how to control the display of hydrogens
> when depicting using DepictionGenerator?
>
> It looks like implicit hydrogens are displayed on terminal and hetero
> atoms which is not unreasonable, but what if I ONLY want explicit
> hydrogens to be displayed (e.g. when depicting a query structure) or I
> don't want any hydrogens to be displayed?
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Stereochemistry resolution

2019-07-30 Thread John Mayfield
Improvements after I believe it was 1.5.4+

https://github.com/cdk/cdk/wiki/1.5.4-Release-Notes#stereochemistry-

On Tue, 30 Jul 2019 at 14:19, John Mayfield 
wrote:

> Ah okay, in some versions of 1.5 it's supported. Which subversion are you
> using?
>
> On Tue, 30 Jul 2019 at 09:34, Wehner, Sebastian <
> sebastian.weh...@bruker.com> wrote:
>
>> Thanks for the clarification. And sorry, it was a typo I meant version
>> 1.5. I assume your explanation still holds true for that as well?
>>
>>
>>
>> Sebastian
>>
>>
>>
>> *From:* John Mayfield 
>> *Sent:* Tuesday, July 30, 2019 10:27 AM
>> *To:* Wehner, Sebastian 
>> *Cc:* cdk-user@lists.sourceforge.net
>> *Subject:* Re: [Cdk-user] Stereochemistry resolution
>>
>>
>>
>> No, it wasn't possible. They used different data structures so you could
>> go to/from SMILES and to/from 2D Mol/CML but not from Mol to Smi or Smi to
>> Mol. Please don't use 1.1.5 (I presume as there is no 1.15 version) it's
>> 10+ years old.
>>
>>
>>
>> John
>>
>>
>>
>> On Tue, 30 Jul 2019 at 08:49, Wehner, Sebastian via Cdk-user <
>> cdk-user@lists.sourceforge.net> wrote:
>>
>> Hello,
>>
>>
>>
>> I am trying to produce a smiles string from a molfile, via AtomContainer
>> as an intermediate. Is it possible to properly resolve stereochemistry in
>> CDK version 1.15?
>>
>>
>>
>> In some detail:
>>
>> I tried some approaches with known stereo-isotopes for which I both had
>> the molfile (SDF). The molfile was parsed via MDLV2000Reader to an
>> AtomContainer which in turn was passed to the SmilesGenerator for parsing
>> to smiles. Sadly both molfiles resulted in the same smiles string.
>>
>> I read the paper for CDK v2.0 (
>> https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0220-4)
>> which states that in this version the stereochemistry is standardized. But
>> it does not convey whether it wasn’t possible before. So is there a way to
>> resolve stereochemistry in v1.15? And if so, can anyone provide some code
>> examples?
>>
>>
>>
>> Hope someone can shed some light on this,
>>
>> Best Regards
>>
>> Sebastian
>>
>>
>>
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Stereochemistry resolution

2019-07-30 Thread John Mayfield
Ah okay, in some versions of 1.5 it's supported. Which subversion are you
using?

On Tue, 30 Jul 2019 at 09:34, Wehner, Sebastian 
wrote:

> Thanks for the clarification. And sorry, it was a typo I meant version
> 1.5. I assume your explanation still holds true for that as well?
>
>
>
> Sebastian
>
>
>
> *From:* John Mayfield 
> *Sent:* Tuesday, July 30, 2019 10:27 AM
> *To:* Wehner, Sebastian 
> *Cc:* cdk-user@lists.sourceforge.net
> *Subject:* Re: [Cdk-user] Stereochemistry resolution
>
>
>
> No, it wasn't possible. They used different data structures so you could
> go to/from SMILES and to/from 2D Mol/CML but not from Mol to Smi or Smi to
> Mol. Please don't use 1.1.5 (I presume as there is no 1.15 version) it's
> 10+ years old.
>
>
>
> John
>
>
>
> On Tue, 30 Jul 2019 at 08:49, Wehner, Sebastian via Cdk-user <
> cdk-user@lists.sourceforge.net> wrote:
>
> Hello,
>
>
>
> I am trying to produce a smiles string from a molfile, via AtomContainer
> as an intermediate. Is it possible to properly resolve stereochemistry in
> CDK version 1.15?
>
>
>
> In some detail:
>
> I tried some approaches with known stereo-isotopes for which I both had
> the molfile (SDF). The molfile was parsed via MDLV2000Reader to an
> AtomContainer which in turn was passed to the SmilesGenerator for parsing
> to smiles. Sadly both molfiles resulted in the same smiles string.
>
> I read the paper for CDK v2.0 (
> https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0220-4)
> which states that in this version the stereochemistry is standardized. But
> it does not convey whether it wasn’t possible before. So is there a way to
> resolve stereochemistry in v1.15? And if so, can anyone provide some code
> examples?
>
>
>
> Hope someone can shed some light on this,
>
> Best Regards
>
> Sebastian
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Stereochemistry resolution

2019-07-30 Thread John Mayfield
No, it wasn't possible. They used different data structures so you could go
to/from SMILES and to/from 2D Mol/CML but not from Mol to Smi or Smi to
Mol. Please don't use 1.1.5 (I presume as there is no 1.15 version) it's
10+ years old.

John

On Tue, 30 Jul 2019 at 08:49, Wehner, Sebastian via Cdk-user <
cdk-user@lists.sourceforge.net> wrote:

> Hello,
>
>
>
> I am trying to produce a smiles string from a molfile, via AtomContainer
> as an intermediate. Is it possible to properly resolve stereochemistry in
> CDK version 1.15?
>
>
>
> In some detail:
>
> I tried some approaches with known stereo-isotopes for which I both had
> the molfile (SDF). The molfile was parsed via MDLV2000Reader to an
> AtomContainer which in turn was passed to the SmilesGenerator for parsing
> to smiles. Sadly both molfiles resulted in the same smiles string.
>
> I read the paper for CDK v2.0 (
> https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0220-4)
> which states that in this version the stereochemistry is standardized. But
> it does not convey whether it wasn’t possible before. So is there a way to
> resolve stereochemistry in v1.15? And if so, can anyone provide some code
> examples?
>
>
>
> Hope someone can shed some light on this,
>
> Best Regards
>
> Sebastian
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Stereochemistry is disregarded when creating SMILES from MOL-File

2019-07-17 Thread John Mayfield
Why were you using that version?

On Wed, 17 Jul 2019 at 06:37, Wehner, Sebastian 
wrote:

> Hi John,
>
>
>
> That is what I suspected. I had some hope that there was still a way to
> properly convert stereochemistry in this version…
>
> Anyways, thanks for your quick answer and explanations.
>
>
>
> Sebastian
>
>
>
> *From:* John Mayfield 
> *Sent:* Tuesday, July 16, 2019 6:05 PM
> *To:* Wehner, Sebastian 
> *Cc:* cdk-user@lists.sourceforge.net
> *Subject:* Re: [Cdk-user] Stereochemistry is disregarded when creating
> SMILES from MOL-File
>
>
>
> Hi Sebastian,
>
>
>
> > I am using CDK version 1.4.17
>
>
>
> That is a very old version and does not convert stereochemistry
> correctly.  Essentially there wasn't data structures to represent it so it
> was store different for 0D (e.g. SMILES) vs 2D vs 3D. This will all fixed
> about 7 years ago :-). Latest release is 2.2 BTW -
> https://github.com/cdk/cdk/releases
>
>
>
> John
>
>
>
> On Tue, 16 Jul 2019 at 15:04, Wehner, Sebastian via Cdk-user <
> cdk-user@lists.sourceforge.net> wrote:
>
> Hi,
>
>
>
> I could use your help! I am using CDK version 1.4.17 and want to build a
> SMILES string from a mol-file (SDF from lipid maps). The molecule has two
> isomers, but this information should be included in the mol-file, should it
> not?
>
>
>
> Anyways, I pass the mol-file as ByteArrayInputStream into MDLV2000Reader and
> then create a new AtomContainer from this. Iterating over each atom of
> the AtomContainer, using a CDKAtomTypeMatcher to get the IAtomType of the
> atom which I then use to configure this atom with via AtomTypeManipulator
> .configure().
>
>
>
> Finally adding implicit hydrogens to the AtomContainer, creating a
> SmilesGenerator and returning the smiles of the AtomContainer with
> smilesGenerator.createSMILES(AtomContainer).
>
>
>
> However this produces a SMILES of the molecule which disregards the
> stereochemistry. The documentation states, that stereochemistry is taken
> into account (
> http://cdk.github.io/cdk/1.4/docs/api/org/openscience/cdk/smiles/SmilesGenerator.html).
> Am I missing something?
>
>
>
> Would be great I someone could help! Added the link to lipid maps for the
> example: http://www.lipidmaps.org/data/LMSDRecord.php?LMID=LMGL02010378.
>
>
>
>
>
> Best regards,
>
> Sebastian Wehner
>
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Stereochemistry is disregarded when creating SMILES from MOL-File

2019-07-16 Thread John Mayfield
Hi Sebastian,

> I am using CDK version 1.4.17

That is a very old version and does not convert stereochemistry correctly.
Essentially there wasn't data structures to represent it so it was store
different for 0D (e.g. SMILES) vs 2D vs 3D. This will all fixed about 7
years ago :-). Latest release is 2.2 BTW -
https://github.com/cdk/cdk/releases

John

On Tue, 16 Jul 2019 at 15:04, Wehner, Sebastian via Cdk-user <
cdk-user@lists.sourceforge.net> wrote:

> Hi,
>
>
>
> I could use your help! I am using CDK version 1.4.17 and want to build a
> SMILES string from a mol-file (SDF from lipid maps). The molecule has two
> isomers, but this information should be included in the mol-file, should it
> not?
>
>
>
> Anyways, I pass the mol-file as ByteArrayInputStream into MDLV2000Reader and
> then create a new AtomContainer from this. Iterating over each atom of
> the AtomContainer, using a CDKAtomTypeMatcher to get the IAtomType of the
> atom which I then use to configure this atom with via AtomTypeManipulator
> .configure().
>
>
>
> Finally adding implicit hydrogens to the AtomContainer, creating a
> SmilesGenerator and returning the smiles of the AtomContainer with
> smilesGenerator.createSMILES(AtomContainer).
>
>
>
> However this produces a SMILES of the molecule which disregards the
> stereochemistry. The documentation states, that stereochemistry is taken
> into account (
> http://cdk.github.io/cdk/1.4/docs/api/org/openscience/cdk/smiles/SmilesGenerator.html).
> Am I missing something?
>
>
>
> Would be great I someone could help! Added the link to lipid maps for the
> example: http://www.lipidmaps.org/data/LMSDRecord.php?LMID=LMGL02010378.
>
>
>
>
>
> Best regards,
>
> Sebastian Wehner
>
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Enantiomer generator?

2019-05-08 Thread John Mayfield
No there isn't and I'm struggling to think of a use-case so I'll first ask
what's your actually end goal as there is likely a more efficient approach.
For example testing if two compounds are enantiomers does not require
enumeration.

But if you really want to enumerate - I would just flip them as you
suggested and think if you need to worry about steric-ally hindered cases.
If you care enough you can handle the common one e.g. bicyclo, with a
simple check. IIRC Greg had some routines in RDKit to enumerate to filter
them out by doing some 3D geometry calc, personally I feel this is too
expensive for the large numbers that can be generated but horses for corses.

John

On Wed, 8 May 2019 at 18:44, Daniel Katzel  wrote:

> Hello all
>   Does CDK have a way to generate enantiomers of a given IAtomContainer? I
> guess one could go to each stereo center and flip up and down bonds but I'm
> sure that get hairy very quickly if some are connected to multiple chiral
> atoms.
>
> Thanks
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] CDK2.0 with python

2019-04-08 Thread John Mayfield
Just asked Noel and the main code base is up to date with CDK 2+ it's just
the doc/release which is out of date. If you pull the *master* branch you
should be able to use CDK 2+ no problem:
https://github.com/cinfony/cinfony

On Mon, 8 Apr 2019 at 09:20, Ganapati Natarajan  wrote:

> Thanks.
>
> Ganapati
>
> On Mon, 8 Apr 2019 at 12:59, John Mayfield 
> wrote:
>
>> I thought there was a Cinfony version suing CDK 2+... but can't find it
>> now or even the repo as google code no longer exists. CC'ing Noel.
>>
>> On Mon, 8 Apr 2019 at 07:52, Ganapati Natarajan 
>> wrote:
>>
>>> Dear all,
>>>
>>> I wish to use the CDK 2.0 from python. I noticed on the cinfony website
>>> that the CDK 1.4 version can be used with cinfony. Please advise on how to
>>> use CDK2.0 with python.
>>>
>>> Thanks in advance,
>>>
>>> Ganapati
>>> ___
>>> Cdk-user mailing list
>>> Cdk-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] CDK2.0 with python

2019-04-08 Thread John Mayfield
I thought there was a Cinfony version suing CDK 2+... but can't find it now
or even the repo as google code no longer exists. CC'ing Noel.

On Mon, 8 Apr 2019 at 07:52, Ganapati Natarajan  wrote:

> Dear all,
>
> I wish to use the CDK 2.0 from python. I noticed on the cinfony website
> that the CDK 1.4 version can be used with cinfony. Please advise on how to
> use CDK2.0 with python.
>
> Thanks in advance,
>
> Ganapati
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] How to add hydrogens with 3D coordinate?

2019-02-23 Thread John Mayfield
Unfortunately there is no easy way to do this ATM other then regeneration
3D coordinates and that support is pretty limited in CDK. Of course the
question is then do you want minimised hydrogens or any old reasonably
valid positions. As a first approximation you can set the hydrogen
coordinates to the same as the atom they are attached.

However I think it's a reasonable thing to do if a 3D (or 2D) molecule
comes in and will add something like OESet3DHydrogenGeom
.
Please can you add a GitHub issue for this.

Thanks,
John

On Sat, 16 Feb 2019 at 12:38, love_software0 via Cdk-user <
cdk-user@lists.sourceforge.net> wrote:

> Dear all,
>
> The Chemistry Development Kit(CDK) is very useful for my current work.
> However, I meet a problem: when I read and add hydrogens on the sybyl mol2
> format 3D molecule by using CDK, it seems the added hydrogens have no
> coordinates. I use the code as below:
>
> "AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(mol);
> CDKHydrogenAdder.getInstance(mol.getBuilder()).addImplicitHydrogens(mol);
> AtomContainerManipulator.convertImplicitToExplicitHydrogens(mol);
> "
> The added hydrogens with coordinates is very important for my current
> project. I had checked the API of CDK, but I can not find the suitable way
> to solve this problem.
>
> So could anyone gives me some codes or suggestions on adding hydrogens
> with 3D coordinates on molecules?
>
> Thanks for your help.
>
> Sincerely,
>
> Qifeng
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Tow problems about the calculation of molecular weight

2019-02-14 Thread John Mayfield
As an aside we are thinking of simplifying this to a single API point
*getMass(mol,
opt) *where the option lets you choose what you want.

The existing API points will still be valid but defer to this method.

John

On Thu, 14 Feb 2019 at 09:59, John Mayfield 
wrote:

> Please note the correct way to get Molecular Weight is:
>
> AtomContainerManipulator.getMolecularWeight(mol);
>
>
> We are aware of the issue with the MolecularWeight descriptor - please see
> the issue tracker.
>
> On Thu, 14 Feb 2019 at 08:44, Stesycki, Manuel <
> stesy...@mpi-muelheim.mpg.de> wrote:
>
>> Dear love_software0,
>>
>> i am running CDK Version 2.2.
>>
>> As a test structure i used Benzene (CAS 71-43-2)
>>
>> 1) I calculate the mass by using:
>> *double mw = AtomContainerManipulator.getMolecularWeight(mol);*
>>
>> 2) To calculate the monoIsotopicMass i use:
>>
>> *IMolecularFormula form =
>> MolecularFormulaManipulator.getMolecularFormula(mol);*
>> *double mw = MolecularFormulaManipulator.getTotalExactMass(form);*
>>
>> The methods from 1) and 2) calculate the following results:
>>
>> 1) 78.11205990368276
>> 2)  78.046950192
>> your code) 78.04695024
>>
>> I attached 2 screen shots. One from SciFinder which states an mw of 78.11.
>> The other on from ChemDraw V18. There the exact Mass is equal to your
>> result and the molecular weight (Mol.Wt.) matches the SciFinder value.
>>
>> Best regards,
>>Manuel Stesycki
>>
>> IT
>>0208 / 306-2146
>>Physikbau, Büro 117
>>stesy...@mpi-muelheim.mpg.de
>>
>> Max-Planck-Institut für Kohlenforschung
>>Kaiser-Wilhelm-Platz 1
>>D-45470 Mülheim an der Ruhr
>>http://www.kofo.mpg.de/de
>> ___
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Tow problems about the calculation of molecular weight

2019-02-14 Thread John Mayfield
Please note the correct way to get Molecular Weight is:

AtomContainerManipulator.getMolecularWeight(mol);


We are aware of the issue with the MolecularWeight descriptor - please see
the issue tracker.

On Thu, 14 Feb 2019 at 08:44, Stesycki, Manuel 
wrote:

> Dear love_software0,
>
> i am running CDK Version 2.2.
>
> As a test structure i used Benzene (CAS 71-43-2)
>
> 1) I calculate the mass by using:
> *double mw = AtomContainerManipulator.getMolecularWeight(mol);*
>
> 2) To calculate the monoIsotopicMass i use:
>
> *IMolecularFormula form =
> MolecularFormulaManipulator.getMolecularFormula(mol);*
> *double mw = MolecularFormulaManipulator.getTotalExactMass(form);*
>
> The methods from 1) and 2) calculate the following results:
>
> 1) 78.11205990368276
> 2)  78.046950192
> your code) 78.04695024
>
> I attached 2 screen shots. One from SciFinder which states an mw of 78.11.
> The other on from ChemDraw V18. There the exact Mass is equal to your
> result and the molecular weight (Mol.Wt.) matches the SciFinder value.
>
> Best regards,
>Manuel Stesycki
>
> IT
>0208 / 306-2146
>Physikbau, Büro 117
>stesy...@mpi-muelheim.mpg.de
>
> Max-Planck-Institut für Kohlenforschung
>Kaiser-Wilhelm-Platz 1
>D-45470 Mülheim an der Ruhr
>http://www.kofo.mpg.de/de
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] CDK v2.2

2018-10-30 Thread John Mayfield
On (2) you can also just remove all the Sgroup info, likely you're not even
using it.

mol.setProperty(CDKConstants.CTAB_SGROUPS, null);
>

On Tue, 30 Oct 2018 at 16:15, John Mayfield 
wrote:

> 1) You can just include cdk-legacy and use the existing, but the
> functionality was just a connivence the same as getMinMax(container) but
> without the Java AWT dependency which caused problems for Andriod/SWT. IIRC
> this was the only place AWT was use in the core package. If you just want
> the width/height use: get2DDimension. Note this was 4+ years ago :-)
> https://github.com/cdk/cdk/commit/214785ce18e2d06f1ba7d9fddc82c0ea9753a385#diff-9a119f1ec045c70b21aa694d01bbc773
> 2) Looks like a bug, but you really really really should not be using
> clone.
>
> On Tue, 30 Oct 2018 at 15:51, Syed Asad Rahman  wrote:
>
>> Thanks.
>>
>>
>>
>> I have started to play with the new release.
>>
>> Got few regression which is fine with API changes.
>>
>>
>>
>> Any pointer please?
>>
>>
>>
>> Q1) What is the equivalent of GeometryTools.getRectangle2D in the new
>> GeometryUtil?
>>
>> import static org.openscience.cdk.geometry.GeometryTools.getRectangle2D;
>>
>> Q2)
>>
>> IAtomContainer clone = org.clone();
>>
>>
>>
>> Throws java.lang.ClassCastException: java.util.ArrayList cannot be cast
>> to org.openscience.cdk.sgroup.SgroupBracket
>>
>> java.lang.ClassCastException: java.util.ArrayList cannot be cast to
>> org.openscience.cdk.sgroup.SgroupBracket
>>
>>     at
>> org.openscience.cdk.tools.manipulator.SgroupManipulator.copy(SgroupManipulator.java:108)
>>
>> at
>> org.openscience.cdk.AtomContainer.clone(AtomContainer.java:1408)
>>
>>
>>
>> *From: *John Mayfield 
>> *Date: *Tuesday, 30 October 2018 at 15:39
>> *To: *Syed Asad Rahman 
>> *Cc: *cdkuser 
>> *Subject: *Re: [Cdk-user] CDK v2.2
>>
>>
>>
>> Yes, but there has been no changes to SilentChemObjectBuilder...?
>>
>>
>>
>> - I did try testing RDT with the new AtomContainer APIs but your tests
>> took too long to run and I have stuff to do :-).
>>
>> - Likewise I did the same with Ambit but there are integration tests with
>> dependencies on DBs etc so difficult to see if there is actually anything
>> wrong. Bigger problem in Ambit is extended type (e.g. SuppleAtomContainer)
>> which needs to implement some new methods. Often trivial stuff convenience
>> stuff that could be resolved with Java 8 and default method implementations
>> on interfaces, but we're currently on Java 7 so not an option.
>>
>>
>>
>> John
>>
>>
>>
>> On Tue, 30 Oct 2018 at 14:47, Syed Asad Rahman  wrote:
>>
>> Thank you John and Developers!
>>
>>
>>
>> This is a fantastic news!
>>
>> One quick question before I pull it into SMSD and RDT - Is 
>> SilentChemObjectBuilder thread safe?
>>
>> Best wishes,
>>
>> -Asad
>>
>>
>>
>> *From: *John Mayfield 
>> *Date: *Tuesday, 30 October 2018 at 12:01
>> *To: *cdkuser 
>> *Subject: *[Cdk-user] CDK v2.2
>>
>>
>>
>> Dear CDK users,
>>
>>
>>
>> CDK 2.2 is now released and can be obtained from Maven central or the
>> GitHub <https://github.com/cdk/cdk/releases/tag/cdk-2.2> site. The full 
>> release
>> notes <https://github.com/cdk/cdk/wiki/2.2-Release-Notes> provide
>> details on the new features and changes.
>>
>>
>>
>> As noted in v2.1/v2.1.1 the AtomContainer2
>> <https://github.com/cdk/cdk/wiki/AtomContainer2> is now the default so
>> if you didn't try running v2.1/v2.1.1 with the flag
>> CdkUseLegacyAtomContainer=false you may some breakages. As highlighted on
>> the wiki page <https://github.com/cdk/cdk/wiki/AtomContainer2> this
>> normally just requires addAtom/Bond statements be reordered and using
>> ``Objects.equals(container1, container2)`` instead of reference comparison
>> (container1 == container2).
>>
>>
>>
>> - John
>>
>>
>>
>>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] CDK v2.2

2018-10-30 Thread John Mayfield
It's more that clone() is an indication of bad style. Unfortunately a lot
of the CDK (particularly the QSAR code) is built on the premise this is
easy and cheap. Also all out *clone()* implementations throw
*CloneNotSupported* when they shouldn't this leads to ugly try/catch down
stream.

It is true there isn't a viable alternative for a deep copy and having a
*AtomContainerManipulator.copy()* for example would help, ideally the copy
constructor should have been a deep copy (it's currently a shallow).

John


On Tue, 30 Oct 2018 at 16:50, Christoph Steinbeck <
christoph.steinb...@uni-jena.de> wrote:

>
> > On 30. Oct 2018, at 17:15, John Mayfield 
> wrote:
> >
> > 2) Looks like a bug, but you really really really should not be using
> clone.
>
> You also say that in https://github.com/cdk/cdk/wiki/AtomContainer2,
> which is a great guidance, apart from indicating alternatives cloning for
> the less enlightened :)
>
> The only alternative to cloning is to write code which comes down to a
> custom clone method, or not? i.e. you create a new AtomContainer and copy
> over the objects that you need for your algorithm, by, say, instantiating
> new atoms with identical properties, and adding them to the AC.
>
> Is your argument that clone does a lot more work than one might need in
> one's specific case?
>
> Kind regards,
>
> Chris
>
> —
> Prof. Dr. Christoph Steinbeck
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> Phone Secretariat: +49-3641-948171
> http://cheminf.uni-jena.de
> http://orcid.org/-0001-6966-0814
>
> What is man but that lofty spirit - that sense of enterprise.
> ... Kirk, "I, Mudd," stardate 4513.3..
>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] CDK v2.2

2018-10-30 Thread John Mayfield
1) You can just include cdk-legacy and use the existing, but the
functionality was just a connivence the same as getMinMax(container) but
without the Java AWT dependency which caused problems for Andriod/SWT. IIRC
this was the only place AWT was use in the core package. If you just want
the width/height use: get2DDimension. Note this was 4+ years ago :-)
https://github.com/cdk/cdk/commit/214785ce18e2d06f1ba7d9fddc82c0ea9753a385#diff-9a119f1ec045c70b21aa694d01bbc773
2) Looks like a bug, but you really really really should not be using
clone.

On Tue, 30 Oct 2018 at 15:51, Syed Asad Rahman  wrote:

> Thanks.
>
>
>
> I have started to play with the new release.
>
> Got few regression which is fine with API changes.
>
>
>
> Any pointer please?
>
>
>
> Q1) What is the equivalent of GeometryTools.getRectangle2D in the new
> GeometryUtil?
>
> import static org.openscience.cdk.geometry.GeometryTools.getRectangle2D;
>
> Q2)
>
> IAtomContainer clone = org.clone();
>
>
>
> Throws java.lang.ClassCastException: java.util.ArrayList cannot be cast to
> org.openscience.cdk.sgroup.SgroupBracket
>
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to
> org.openscience.cdk.sgroup.SgroupBracket
>
> at
> org.openscience.cdk.tools.manipulator.SgroupManipulator.copy(SgroupManipulator.java:108)
>
>     at
> org.openscience.cdk.AtomContainer.clone(AtomContainer.java:1408)
>
>
>
> *From: *John Mayfield 
> *Date: *Tuesday, 30 October 2018 at 15:39
> *To: *Syed Asad Rahman 
> *Cc: *cdkuser 
> *Subject: *Re: [Cdk-user] CDK v2.2
>
>
>
> Yes, but there has been no changes to SilentChemObjectBuilder...?
>
>
>
> - I did try testing RDT with the new AtomContainer APIs but your tests
> took too long to run and I have stuff to do :-).
>
> - Likewise I did the same with Ambit but there are integration tests with
> dependencies on DBs etc so difficult to see if there is actually anything
> wrong. Bigger problem in Ambit is extended type (e.g. SuppleAtomContainer)
> which needs to implement some new methods. Often trivial stuff convenience
> stuff that could be resolved with Java 8 and default method implementations
> on interfaces, but we're currently on Java 7 so not an option.
>
>
>
> John
>
>
>
> On Tue, 30 Oct 2018 at 14:47, Syed Asad Rahman  wrote:
>
> Thank you John and Developers!
>
>
>
> This is a fantastic news!
>
> One quick question before I pull it into SMSD and RDT - Is 
> SilentChemObjectBuilder thread safe?
>
> Best wishes,
>
> -Asad
>
>
>
> *From: *John Mayfield 
> *Date: *Tuesday, 30 October 2018 at 12:01
> *To: *cdkuser 
> *Subject: *[Cdk-user] CDK v2.2
>
>
>
> Dear CDK users,
>
>
>
> CDK 2.2 is now released and can be obtained from Maven central or the
> GitHub <https://github.com/cdk/cdk/releases/tag/cdk-2.2> site. The full 
> release
> notes <https://github.com/cdk/cdk/wiki/2.2-Release-Notes> provide details
> on the new features and changes.
>
>
>
> As noted in v2.1/v2.1.1 the AtomContainer2
> <https://github.com/cdk/cdk/wiki/AtomContainer2> is now the default so if
> you didn't try running v2.1/v2.1.1 with the flag
> CdkUseLegacyAtomContainer=false you may some breakages. As highlighted on
> the wiki page <https://github.com/cdk/cdk/wiki/AtomContainer2> this
> normally just requires addAtom/Bond statements be reordered and using
> ``Objects.equals(container1, container2)`` instead of reference comparison
> (container1 == container2).
>
>
>
> - John
>
>
>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] CDK v2.2

2018-10-30 Thread John Mayfield
Yes, but there has been no changes to SilentChemObjectBuilder...?

- I did try testing RDT with the new AtomContainer APIs but your tests took
too long to run and I have stuff to do :-).
- Likewise I did the same with Ambit but there are integration tests with
dependencies on DBs etc so difficult to see if there is actually anything
wrong. Bigger problem in Ambit is extended type (e.g. SuppleAtomContainer)
which needs to implement some new methods. Often trivial stuff convenience
stuff that could be resolved with Java 8 and default method implementations
on interfaces, but we're currently on Java 7 so not an option.

John

On Tue, 30 Oct 2018 at 14:47, Syed Asad Rahman  wrote:

> Thank you John and Developers!
>
>
>
> This is a fantastic news!
>
> One quick question before I pull it into SMSD and RDT - Is 
> SilentChemObjectBuilder thread safe?
>
> Best wishes,
>
> -Asad
>
>
>
> *From: *John Mayfield 
> *Date: *Tuesday, 30 October 2018 at 12:01
> *To: *cdkuser 
> *Subject: *[Cdk-user] CDK v2.2
>
>
>
> Dear CDK users,
>
>
>
> CDK 2.2 is now released and can be obtained from Maven central or the
> GitHub <https://github.com/cdk/cdk/releases/tag/cdk-2.2> site. The full 
> release
> notes <https://github.com/cdk/cdk/wiki/2.2-Release-Notes> provide details
> on the new features and changes.
>
>
>
> As noted in v2.1/v2.1.1 the AtomContainer2
> <https://github.com/cdk/cdk/wiki/AtomContainer2> is now the default so if
> you didn't try running v2.1/v2.1.1 with the flag
> CdkUseLegacyAtomContainer=false you may some breakages. As highlighted on
> the wiki page <https://github.com/cdk/cdk/wiki/AtomContainer2> this
> normally just requires addAtom/Bond statements be reordered and using
> ``Objects.equals(container1, container2)`` instead of reference comparison
> (container1 == container2).
>
>
>
> - John
>
>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


[Cdk-user] CDK v2.2

2018-10-30 Thread John Mayfield
Dear CDK users,

CDK 2.2 is now released and can be obtained from Maven central or the GitHub
 site. The full release
notes  provide details
on the new features and changes.

As noted in v2.1/v2.1.1 the AtomContainer2
 is now the default so if
you didn't try running v2.1/v2.1.1 with the flag
CdkUseLegacyAtomContainer=false you may some breakages. As highlighted on
the wiki page  this
normally just requires addAtom/Bond statements be reordered and using
``Objects.equals(container1, container2)`` instead of reference comparison
(container1 == container2).

- John
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Question on CDK and a small documentation

2018-10-29 Thread John Mayfield
I am still curious why, this kind of dubious manipulation is how errors
start propagating. Anyways I can only ask twice :-). These are not
depiction options but you can achieve it by modifying the molecule.

1) I think what you're asking is to get the "parent" molecule. You should
define a list of the salts (counter ions) you want to remove and just
remove them from the IAtomContainer. A common hack is to remove everything
but the largest component (OEChem has the amusingly name:
OETheFunctionFormerlyKnownAsStripSalts
).
In CDK you would use the ConnectivityChecker

sort
by size and take the largest one. Note you may end up with a non-neutral
form if care is not taken and if there are two components of the same size
you should decide what to do.

2) Remove the stereochemistry,
*container.setStereoElements(Collections.emptyList())*; prior to generating
coordinates. Or if you already have coordinates, iterate over the bonds and
*setStereo(IBond.Stereo.NONE)*. Depending on what you're asking you may
want to set the bond order to single (*setOrder(IBond.Order.SINGLE)*) too,
note this will then mess up valence and so you'd have radicals, so you'd
have to sort that out etc. Again removing information from the molecule is
not a good.
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Question on CDK and a small documentation

2018-10-29 Thread John Mayfield
Please use the *cdk-user* mailing (cc'd) for such questions in future.
Other people than me can help, and if someone has the same question they'll
get to see the answer too (it's also archived).

Is there any possibilities to turn off the counter ion depiction in CDK and
> can we use a just lines for all types of bonds, rather than wedged bonds
> and dashed style bonds?


Yes and yes. But I'd like a bit more details, was there something wrong
that needs fixing?

Under Bond Count Descriptor the documentation says that you can use
> parameter “a” for aromatic bond counts, but after CDK 2.0 There is a new
> class called AromaticBondCountdescriptor made available. So now a developer
> cannot use that parameter. Also now a developer can use “q” for quadruple
> bonds, which should be added in the API documentation.


Please feel free to add a patch via GitHub that updates the JavaDoc.

John


On Mon, 29 Oct 2018 at 14:39, Kohulan Rajan 
wrote:

> Dear John,
>
>
>
> Hope you are doing well. I am Kohulan Rajan , currently a Ph.D student
> working Under Prof.C.Steinbeck.
>
>
>
> This is regarding a question regarding CDK and a small documentation in
> the new release.
>
>
>
> Question.
>
> Is there any possibilities to turn off the counter ion
> depiction in CDK, and can we use a just lines for all types of bonds,
> rather than wedged bonds and dashed style bonds?
>
>
>
> Update,
>
>
>
> Under Bond Count Descriptor the documentation says that you can use
> parameter “a” for aromatic bond counts, but after CDK 2.0 There is a new
> class called AromaticBondCountdescriptor made available. So now a developer
> cannot use that parameter.
>
> Also now a developer can use “q” for quadruple bonds, which should be
> added in the API documentation.
>
>
>
>
>
> Awaiting for your reply.
>
>
>
> Kind Regards,
> ~Kohulan.R
>
> ___
>
> Kohulan Rajan
> PhD Student
> Faculty of Chemistry and Geosciences
> Institute of Inorganic and Analytical Chemistry - Cheminformatics and
> Chemometrics
> Friedrich-Schiller-University
> Lessingstraße 8, 07743 Jena , Germany
>
> http://cheminf.uni-jena.de
> Phone : +49 3641 948783
>
> “It is our choices that show what we truly are, far more than our
> abilities.” - Albus Dumbledore
>
>
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] Aligning Reaction using Fixed Substructure

2018-10-11 Thread John Mayfield
Nice,

For historical reasons, CDK uses unit bond length 1.5 instead of 1.
Something to do with C-C bonds but that really only makes sense for 3D.
Rescale it like this:

GeometryUtil.scaleMolecule(fixedSubstructure,

1.5/GeometryUtil.getBondLengthMedian(fixedSubstructure));

Also I presume you want *findSubstructure* rather than *findIdentical* (exact
match). You should also avoid the *count() > 1* this is very waste-full if
there are a lot of automorphisms in the query. Basically it says find them
all and count them and then re-find the first one for alignment. You can
completely remove that check as show but for the benefit of the mailing
list the correct way to write that if-condition is:


> *if (mappings.atLeast(1)) {}*


Now the tricky part is working out when to/not align atoms in generic
queries, for example: *C~C~O* matches both C=C=O and CCO the first would
should not be bent when laid out. Anyways it's an open problem and for most
queries it will be fine.

Here's the final function, you probably also want the highlighting done at
the same time but have omitted that here:

https://gist.github.com/johnmay/12797a89f4186bc7da881f1f4a706671

public static void alignMoleculeToSubstructure(IAtomContainer mol,
>IAtomContainer sub,
>boolean fixBonds) throws
> CDKException {
>
> *Pattern substructurePattern = Pattern.findSubstructure(sub);*
> Mappings mappings = substructurePattern.matchAll(mol);
> Set fixedAtoms = new HashSet();
> Set fixedBonds = new HashSet();
> for (Map map : mappings.toAtomBondMap()) {
>
> *GeometryUtil.scaleMolecule(sub,
>  1.5/GeometryUtil.getBondLengthMedian(sub));*
> for (IChemObject substructureObject : map.keySet()) {
> IChemObject targetObject = map.get(substructureObject);
> if (targetObject instanceof IAtom) {
> //set the target atom's position to that of the
> substructure atom and add it to the fixed atom list
> IAtom targetAtom = (IAtom) targetObject;
> IAtom substructureAtom = (IAtom) substructureObject;
> targetAtom.setPoint2d(new
> Point2d(substructureAtom.getPoint2d()));
> fixedAtoms.add(targetAtom);
> } else if (fixBonds) {
> //only check bonds if needed
> if (targetObject instanceof IBond) {
> //add the target bond to the fixed bond list
> IBond targetBond = (IBond) substructureObject;
> fixedBonds.add(targetBond);
> }
> }
> }
> //only align to the first matching substructure
> break;
> }
> //generate coordinates for the molecule
> StructureDiagramGenerator sdg = new StructureDiagramGenerator();
> sdg.setMolecule(mol, false, fixedAtoms, fixedBonds);
> sdg.generateCoordinates();
> }


John
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] MCS detection

2018-09-17 Thread John Mayfield
A couple of options,

a) Use the newer standalone version of SMSD, this is why the package is
deprecated. We did try to integrate the newer version but it proved
difficult and there were some test regressions. You can still use the
deprecated one.
b) Edmund Duesbury has some updated MCS algorithms based on CDK for his PhD.

John

On Sun, 16 Sep 2018 at 22:40, Rajarshi Guha  wrote:

> Looking at the 2.0 docs indicates that the SMSD classes for MCS detection
> have been deprecated.
>
> What is the recommended way to identify MCS's in 2.0?
>
> --
> Rajarshi Guha | http://blog.rguha.net | @rguha 
>
> ___
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


  1   2   >