Re: [Rdkit-discuss] Removing solvent and ions from dataset

2020-06-08 Thread Nicolas Bosc
Hi Max,

Third alternative: https://github.com/chembl/ChEMBL_Structure_Pipeline 


parent_molblock, _ = standardizer.get_parent_molblock(o_molblock)

This will strip the molecule.

Nicolas

> On 8 Jun 2020, at 08:19, Pierre-Marie Allard  
> wrote:
> 
> Hi Max,
> 
> You can also use MolVS https://molvs.readthedocs.io/en/latest/ 
> 
> This should suit most of your needs,
> 
> PM
> _
> 
> Pierre-Marie Allard
> Research Assistant - Natural Products Chemistry
> ISPSO - UniGe - Geneva
> pierre-marie.all...@unige.ch 
> 
>> On 8 Jun 2020, at 08:46, Francois Berenger > > wrote:
>> 
>> On 06/06/2020 17:33, Max Pinheiro Jr wrote:
>>> Hi RDkit team,
>>> I am working on a chemically diverse dataset of smiles strings and I
>>> need to do some preprocessing to clean a bit the data before starting
>>> the modeling part. So I was looking for some tools or built-in
>>> functions in RDkit to make such preprocessing by removing, for
>>> instance, solvent (water) molecules and ions. I found the
>>> "SaltRemover" module that may solve my problem with removing ions from
>>> the database, but I could not find an equivalent module for the case
>>> of solvent molecules. Does anyone know a specific tool in RDkit (or
>>> any other python program) to make such preprocessing in the smile
>>> strings? If so, could you please provide just a simple example of how
>>> to do it? I will be really thankful for any help you may provide.
>> 
>> I have used this program several times:
>> 
>> https://github.com/flatkinson/standardiser 
>> 
>> 
>> You can try this:
>> ```
>> pip3 install chemo-standardizer
>> standardiser -i input.smi -o output_std.smi
>> ```
>> 
>> I believe it uses rdkit under the hood.
>> 
>> Regards,
>> F.
>> 
>>> Max Pinheiro Jr
>>> -
>>> Université Aix-Marseille, France
>>> Institut de Chimie Radicalaire
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> 
>> 
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Removing solvent and ions from dataset

2020-06-08 Thread Pierre-Marie Allard
Hi Max,

You can also use MolVS https://molvs.readthedocs.io/en/latest/
This should suit most of your needs,

PM
_

Pierre-Marie Allard
Research Assistant - Natural Products Chemistry
ISPSO - UniGe - Geneva
pierre-marie.all...@unige.ch

On 8 Jun 2020, at 08:46, Francois Berenger 
mailto:mli...@ligand.eu>> wrote:

On 06/06/2020 17:33, Max Pinheiro Jr wrote:
Hi RDkit team,
I am working on a chemically diverse dataset of smiles strings and I
need to do some preprocessing to clean a bit the data before starting
the modeling part. So I was looking for some tools or built-in
functions in RDkit to make such preprocessing by removing, for
instance, solvent (water) molecules and ions. I found the
"SaltRemover" module that may solve my problem with removing ions from
the database, but I could not find an equivalent module for the case
of solvent molecules. Does anyone know a specific tool in RDkit (or
any other python program) to make such preprocessing in the smile
strings? If so, could you please provide just a simple example of how
to do it? I will be really thankful for any help you may provide.

I have used this program several times:

https://github.com/flatkinson/standardiser

You can try this:
```
pip3 install chemo-standardizer
standardiser -i input.smi -o output_std.smi
```

I believe it uses rdkit under the hood.

Regards,
F.

Max Pinheiro Jr
-
Université Aix-Marseille, France
Institut de Chimie Radicalaire
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Removing solvent and ions from dataset

2020-06-08 Thread Francois Berenger

On 06/06/2020 17:33, Max Pinheiro Jr wrote:

Hi RDkit team,

I am working on a chemically diverse dataset of smiles strings and I
need to do some preprocessing to clean a bit the data before starting
the modeling part. So I was looking for some tools or built-in
functions in RDkit to make such preprocessing by removing, for
instance, solvent (water) molecules and ions. I found the
"SaltRemover" module that may solve my problem with removing ions from
the database, but I could not find an equivalent module for the case
of solvent molecules. Does anyone know a specific tool in RDkit (or
any other python program) to make such preprocessing in the smile
strings? If so, could you please provide just a simple example of how
to do it? I will be really thankful for any help you may provide.


I have used this program several times:

https://github.com/flatkinson/standardiser

You can try this:
```
pip3 install chemo-standardizer
standardiser -i input.smi -o output_std.smi
```

I believe it uses rdkit under the hood.

Regards,
F.


Max Pinheiro Jr
-
Université Aix-Marseille, France
Institut de Chimie Radicalaire
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Removing solvent and ions from dataset

2020-06-06 Thread Max Pinheiro Jr
Hi RDkit team,

I am working on a chemically diverse dataset of smiles strings and I need
to do some preprocessing to clean a bit the data before starting the
modeling part. So I was looking for some tools or built-in functions in
RDkit to make such preprocessing by removing, for instance, solvent (water)
molecules and ions. I found the "SaltRemover" module that may solve my
problem with removing ions from the database, but I could not find an
equivalent module for the case of solvent molecules. Does anyone know a
specific tool in RDkit (or any other python program) to make such
preprocessing in the smile strings? If so, could you please provide just a
simple example of how to do it? I will be really thankful for any help you
may provide.

Max Pinheiro Jr
-
Université Aix-Marseille, France
Institut de Chimie Radicalaire
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss